This project is about the vision system of the egn18-dv. It is the second driverless race car, completely designed and assembled by the e-gnition formula student team. At the last formula student driverless competition at the Hockenheimring in Germany, we achieved 3rd place over all participants.
The formula student driverless event is an interdisciplinary competition where university teams from all over the world compete against each other in a variety of engineering and business challenges. The event is divided into three disciplines. The combustion car, the electric car and the driverless car competition. In the driverless challenge, the car has to complete three different tracks.
Perceiving the environment is a crucial task for every self driving vehicle. Different kinds of sensors and sophisticated algorithms are necessary to fulfill this task. However, in case of the formula student driverless challenge, the environment detection and recognition is limited. Each race track is made out of yellow (right) and blue (left) cones. No other obstacles are present on the race track. Therefor, the main goal of our vision system is to detect and classify all cones, that build up the race track. The exact knowledge about the race track is crucial for our path planing system. Our vision system relies on the information of two different sensors.
The first sensor is the Ibeo Lux Light Detecting and Ranging Sensor (LiDAR) provided by the Ibeo Automotive Systems GmbH. As you can see in Figure 1. we have two LiDAR’s in the front and one LiDAR, facing backwards.
The second sensor is the daA1600-60uc - Basler dart camera provided by the Basler AG. We use two cameras, mounted to the main hoop right behind the driver seat, facing in driving direction. Both cameras are rotated by +-45 degree, to achieve an overall field of view of about 180°.
Each LiDAR creates a 3 dimensional point cloud that gets fused and clustered. The clustering algorithm that we use is called Euclidean Cluster and works like the following:
It creates a Kd-tree representation for the input point cloud dataset P;
It sets up an empty list of clusters C and queue of the points that need to be checked Q;
For each point Pi element P it performs the following steps:
The algorithm terminates when all points p element P have been processed and are now part of the list of point clusters C
Making the long story short, the algorithm searches for groups of points, that are close together. These little “sub point clouds” have a high probability of being cones on our track, however there are many other objects near the track such as fire extinguishers which are also often interpreted as cones. The reason for that is that the vertical resolution of our LiDAR’s is not high enough. Point clouds which are the result of the laser beams, hitting cones, fire extinguishers or other kinds of small objects next to the track can look much similar. The result of the current software pipeline is a list of potential cones which we call "map". To further increase the information about the environment and to filter out false positive cone candidates, we implemented the additional camera vision system.
As already mentioned, two daA1600-60uc - Basler dart cameras are mounted to the main hoop of our car. Both cameras are facing in driving direction and are rotated by +- 45° to ensure a field of view of almost 180 degree. The next step in our vision pipeline is the cone projection node. It takes both camera images, the map, containing the potential cone candidates and the odometry of the car, estimated by our state estimation as an input. At each frame, the algorithm selects all cone candidates, that are stored in the map and checks which cone candidate is in the current field of view of our cameras. All cone positions from the map are stored in world coordinates. Therefore it is important to know the position of the car at any time, to calculate the position of each cone candidate relative to the car. After a cone candidate, which is visible by the cameras, has been found, it gets projected into the camera image. The projection utilizes the pin hole model of the camera to calculate the pixel coordinates, where the vector between cone candidate and camera penetrates the image plane. The result can be seen in image … Due to this projection, all potential cone candidates, which previously where represented just by a coordinate in the 3 dimensional space, are now also represented by a small region of interest (ROI) in the current camera frame. This ROI is then used to determine the cone type, whether it is a blue or a yellow cone or if it is no cone at all.
The previously described projection and ROI extraction process is executed on our main CPU. All extracted ROI’s get passed via an ethernet connection to an Nvidia Jetson TX2 module. The communication process between the CPU and the Jetson module is completely handled by our Robot Operating System (ROS) environment. The Jetson module executes a light weight neural network which is made out of a few convolution-, max pooling-, batchnorm- and dropout layers.
The model has been trained previously on many blue and yellow cone images. A third class is the so called “trash” class, that is used for all not cone candidates. In an early stage of our work, we only used the yellow and blue classes as potential prediction outcomes. At that time, we classified false positive cone candidates just by looking at the pure logit output of the architecture. The idea was that “not cone” candidates will result in a very low activation for both classes. However, by introducing a third “not-cone” class, we believe that the network was forced to focus not only on the color of the image but also on the shape, because “not-cone” objects could also be blue or yellow. Using a third “not cone” class was beneficial in the end.
The result of the neural network is send back to the map provider, which then updates the color and the “check state” of the analyzed cone. While driving, the map of the race track gets constantly filled with new cones, false positive candidates get filtered out and color information is stored as well. TO BE CONTINUE