The future of the automotive industry will involve automated and connected vehicles, and in many respects, we are already seeing this transition happen. Stakeholders from around the world continue to pour billions of dollars into this technology and are working to drastically shift their businesses in order to better accommodate this industry-wide revolution. Apart from economic opportunity, autonomous vehicles deliver a number of other values to society. Arguably one of the most important of those is safety. While human drivers already have the ability to sense and make quick and protective decisions on the road, there are still many streams of information that they cannot inherently access on their own. For example, people cannot track the exact paths of every moving object around them, nor can they maintain absolute alertness against every type of environmental condition and stimuli. These small, yet significant imperfections are what lead to the 3.1 million accidents that happen every year on U.S. roads, a statistic that autonomous cars aim to eliminate. To uphold this bold promise, autonomous cars rely heavily on advanced sensing hardware to accurately and quickly capture information about its surroundings, a crucial enabling factor for its advanced decision making and control processes. Each of these sensing hardware types have both built-in and asymptotic strengths and weaknesses, and are particularly optimal to some perception tasks over others. Regardless of design philosophy or business strategy, there is one such sensing type that will always exist: cameras.
Cameras are the eyes of an autonomous car, and are essential to any driving task that requires classification, such as lane finding, road curvature calculation, traffic sign discrimination, and much more. The question of how many cameras to use and where to position them, however, is a choice that developers must make on their own. One such example, as shown in figure 1, illustrates how Tesla equips its Autopilot feature for the Model S and Model X, using eight surround view cameras to enable 360-degree field of view (FOV) and 250 meters of range. This is just one of many other possible sensor configuration types, as each depends on a number of factors such as economics and performance costs.
Fig.: Tesla Autopilot Sensor Configuration
Regardless of the chosen camera configuration, all automotive camera systems are concerned with two regions of sensor coverage: front-looking and side-looking regions. With the greater risk of encountering obstacles, front-looking cameras are typically optimized for range and image resolution. While on the other hand, side-looking cameras are responsible for more coverage with less risk of impact, which corresponds to greater FOV requirements. Banded together, these cameras provide important information to make safety conscious decisions in fast-paced and unforgiving environments.
While cameras alone are by no means an end-to-end solution to autonomous driving requirements, they do provide the means of performing important tasks such as object recognition and automatic lane keeping. Especially when driving in densely populated cities, it is extremely important to be able to distinguish pedestrians from traffic infrastructure, as well as being able to navigate on poorly kept roads. To accomplish these necessary outcomes, there needs to be a way to not only handle the massive bandwidth of the raw data, but also a way to accurately and quickly interpret them for the car to make relevant decisions.
As far as camera network implementations are concerned, there are two options to choose from. The first is a centralized architecture, where each camera sends its raw data straight to a central computing system to be fully processed for perception outputs. The advantages of this set-up are that the logic in the central computer can algorithmically throw out irrelevant data (i.e. pictures of the sky), and the speed of the overall perception system can improve proportionally to the performance of the central computer. However, with this architecture, expanding sensor coverage would be very difficult and expensive. Since all optimization and post-processing lives solely in the central computer, it would need to be re-engineered every time new streams of data are required. Depending on existing capabilities and limitations, that decision could delay milestone launches and/or introduce unexpected increases in production cost. Furthermore, relying on computing improvements could be a risky venture, considering that it is responsible for handling information from all other sensors, not just cameras.
On the other hand, there is an option to deploy a decentralized architecture, where each camera performs its pre-processing independent of each other, and sends those results to a central computing authority for post-processing. Unlike the centralized solution, this structure supports sensor coverage expansion since the preliminary noise removal is all self-contained within the sensing hardware. Furthermore, as the centralized option could only be sped up through improvement to the central computing platform, the decentralized system can also achieve this through the camera’s pre-processing scheme. In fact, Stanford’s electrical engineering department has already developed a proof of concept for an optic-electric camera, which outsources multiple time-consuming pre-processing steps to operations performed at the speed of light. This is one of many examples in how component-level improvements can benefit overall system performance. Unfortunately, the major downside of this architecture is that the speed of the perception system depends on the slowest sensing component, which may be unavoidable depending on pre-set engineering requirements.
Whether it is a centralized or a decentralized architecture, or a hybrid of both, designers must be aware of their short and long-term considerations and use those to make the most appropriate decisions for their end goals. However, regardless of what option is chosen, there remains the question of how to actually process the given images and produce meaningful insights. The answer to this problem is encompassed into two fields of study: computer vision and deep learning. Computer vision is the driving force for capturing regions of interest (i.e. features) within multi-dimensional data, whereas deep learning is the sub-disciple of artificial intelligence that deals with learning complicated sequences of data without being explicitly programmed. Combined together, these technologies are what make the concept of an autonomous car possible: a system that can both recognize and interpret its surroundings to make meaningful decisions.
Unfortunately, attempting to dive into the specifics of computer vision and deep learning would be both time consuming and out of scope for this article. Instead, it would be very beneficial to read through this tutorial on python’s OpenCV API. It provides a good introduction to the theory behind essential image processing techniques, as well as working code and datasets that can be practiced on one’s own time. For a better understanding about deep learning and its relevance to autonomous driving, check out this project that uses deep learning to detect motorcyclists and pedestrians from a moving car: YOLO for self-driving cars, motorcycles, pedestrians & cars detection. One of the key resources this project utilizes is the YOLO (You Only Look Once) Dataset: an extensive library of images that autonomous vehicle developers use to design and optimize a number of perception algorithms.
Through a number of aspects, including hardware set-up, network architecture, and computer vision and deep learning implementation, camera systems in autonomous vehicles do not lack in complexity. For this reason, autonomous vehicle professionals need to keep each of these basic considerations in mind, and to anticipate their corner cases in order to avoid unexpected delays and costs. The future of transportation depends on this.
- Coldewey, Devin. “Here's How Uber's Self-Driving Cars Are Supposed to Detect Pedestrians.” TechCrunch, TechCrunch, 19 Mar. 2018, techcrunch.com/2018/03/19/heres-how-ubers-self-driving-cars-are-supposed-to-detect-pedestrians/.
- Lambert, Fred, et al. “A Look at Tesla's New Autopilot Hardware Suite: 8 Cameras, 1 Radar, Ultrasonics & New Supercomputer.” Electrek, Electrek, 21 Oct. 2016, electrek.co/2016/10/20/tesla-new-autopilot-hardware-suite-camera-nvidia-tesla-vision/.
- “Seeing the Road Ahead: The Importance of Cameras to Self-Driving Vehicles.” Loup Ventures, 5 July 2018, loupventures.com/seeing-the-road-ahead-the-importance-of-cameras-to-self-driving-vehicles-2/.
- Stanford University. “New AI Camera Could Revolutionize Autonomous Vehicles.” Stanford News, 22 Aug. 2018, news.stanford.edu/2018/08/17/new-ai-camera-revolutionize-autonomous-vehicles/.
- Torchinsky, Jason. “How Autonomous Cars See The World.” Jalopnik, Jalopnik, 21 May 2018, jalopnik.com/how-autonomous-cars-see-the-world-1826147310.