Tesla has made waves in the subcontinent by announcing its arrival back in early 2021. Elon Musk is all but ready to set up Tesla India’s manufacturing base in Bangalore, India. While the memes and tweets kept flowing about how the popularly acclaimed “Self-driving Cars” would function in India, AI specialists in India rejoiced. This is just the beginning of an entire wave of Computer Vision taking over the world!
Most of us data science enthusiasts must be aware of some basic Computer Vision concepts like Object Detection, Tracking, etc. However, we all know reality is so much farther from mere theoretical concepts. You must be wondering about the technological stack Tesla uses in delivering what will be a fully automated car of the future! Sounds fascinating, doesn’t it? (Although motorheads may not be as excited with this futuristic advancement). In this article, we discuss some high-level concepts that will make Tesla cars attain full autonomy!
Stating the Obvious – The Sensors
The tasks that Tesla has to handle are quite popular. From lane detection to pedestrian tracking, all of these functions are performed in real-time. For this purpose, Tesla used 8 cameras to function. Moreover, this number of cameras ensures that there is no blind spot and it covers all the areas surrounding the vehicle.
Yes, you read that right! No LIDAR. No high-definition mapping system. Tesla plans to build the auto-pilot model purely with computer vision accompanied by machine learning and video streams from the cameras. This raw footage is then processed through Convolutional Neural Networks (CNNs) for object tracking and detection.
Apart from cameras, Tesla autopilot also boasts of radar and ultrasonic sensors. The radar is utilized for seeing and gauging the distance between cars and other objects. The ultrasonic sensors also work in correspondence to measuring proximity with passive objects, aiming to maximize driver safety. The Tesla hardware is integrated together with neural networks to perceive the environment surrounding the vehicle and make the autopilot features as receptive as possible.
Even when a car isn’t in motion, probably at a crossroads, there are about 100 tasks at that given moment. It’s expensive and inefficient to use a neural network for each task. The Tesla Vehicle AI processes huge doses of information in real-time. So the Computer Vision workflow runs all the tasks on a shared backbone called ResNet-50 that has the ability to run 1000×1000 images at a time. This shared backbone of neural networks is called a Hydranet.
Of course, there are multiple instances of such Hydranets carrying out the AI processing for the vehicle. The information gathered from each hydranet is used to solve recurrent tasks. For instance, there may be a task running to deal with stop signs, another that handles pedestrians, and yet another to check traffic lights. All these individual tasks are run on a shared backbone. This hydranet architecture works on the methodology that each of these tasks requires only a few portions of the gigantic neural network.
This is just like transfer learning, where you have a common block and train specific blocks for specific related tasks. HydraNets have backbones trained on all kinds of objects, and heads trained on specific tasks. This improves the inference speed as well as the time needed to train the model.
There are more than 40 such neural network “heads” which are essentially different tasks that need to be performed. Each task may require a different set of cameras and sensors. Some of these tasks are –
- Road markings
- Traffic signals
- Overhead signs
- Zebra Crossing
- Other vehicles
- Static objects like Cones, Road Barricades
- Environment Tags
There consist of 8 Hydranets for these 8 tasks. As like we have already mentioned, there are 8 sensors/ cameras that are capable of carrying out these tasks.
Training using PyTorch
After the neural networks have been developed, it is also necessary to train the model. We know that Tesla has implemented a myriad of libraries and tools in the background to enable futuristic computer vision capabilities. One such framework is Pytorch originally developed by Facebook’s AI Research lab (FAIR). The Tesla tech stack uses PyTorch for training purposes of the deep learning model.
It’s interesting to note that Tesla doesn’t use LIDAR or maps for achieving full autonomy. Everything is done in real-time and is completely dependent on the cameras and pure computer vision. Tesla uses Pytorch for training and other supporting tasks which include automated workflow scheduler, model threshold calibration, extensive evaluation, passive tests simulation tests, etc.
For the autopilot feature, as we have already mentioned, Tesla trains around 48 networks that do 1,000 different predictions and it takes about 70,000 GPU hours. This training is not a one-time setup! We know that Artificial Intelligence is something that improves over time and thus it is an iterative process. So these 1000 different predictions are kept up to date and never relapse.
You can find out more about the features of Pytorch in this article by Fireblaze AI School. <INSERT GUIDE TO PYTORCH ARTICLE>
Read more about Deep Learning Interview Guide
Some other features
If you think Tesla’s software only has this much to offer, you are in for a surprise. Tesla in its newest model is said to achieve autonomy of level 5. The company has collaborated with Nvidia to optimize its GPU and AI integrated chips. Additionally, Samsung will be manufacturing the processors and they will be installed in all new Tesla variants. Let’s discuss these features Tesla is bound to offer (which was supposedly set to be released by the end of 2019).
1. Bird’s Eye View
Oftentimes, the images interpreted by Tesla hardware may require additional dimensions. The Bird’s Eye View functionality helps to estimate larger distances and provides a better interpretation of the real world. It is a visual monitoring system that “renders” an image of a vehicle from the top view to enable parking with ease and navigate tight spaces. You can totally take the wheel now without a sorry excuse about your parking skills!
2. Smart Summon
Smart Summon is a feature that allows the car to find the driver in the parking lot. Yes, this literally means that your Tesla vehicle will arrive at your spot when you turn it on. Whether that seems like a page right out of a sci-fi novel or completely absurd, this feature has already been rolled out (Personally find it creepy :P). This is also no longer a thing of the future. Most of these features have been rolled out in 2019.
Tesla technology has accumulated 1,000,000,000 miles, 200,000 automated lane changes, and has been in 50+ countries! Just a month after its launch in October 2019, Smart Summon has already completed 500,000 sessions.
3. AI Chip – Dual Chip System
Tesla systems have two AI chips for better safety and performance on the roads. Tesla’s system aims to be foolproof. It has backup power and data input feeds so that even if there’s a single unit failure, the car can continue to work through the spare units. These excess features are Tesla’s way of ensuring that the vehicles are well equipped to prevent accidents in the face of an unprecedented failure.
Admittedly a leader in the fully automated locomotive market, Tesla is still far from delivering a cutting-edge autopilot vehicle. The features we discussed in this article will definitely pave the way for such a car in the future. Tesla has developed their own revolutionary AI chips and neural network architecture.
A few years down the line, Elon Musk’s vision will truly become a success. For now, all we can do is bat around this pioneering tech Tesla aims to deliver to serve level 5 autonomy!