Tesla unveils the new Dojo supercomputer, so powerful that it would have caused a blackout, Dojo uses chips and an infrastructure entirely designed by Tesla

Tesla showed off its new Dojo supercomputer at its annual “AI Day” event on September 30. With Dojo, Tesla’s capabilities in training neural networks are expected to expand significantly. The company said Dojo’s new capabilities will enable it to make significant progress in the development of its autonomous driving system, Autopilot. According to Tesla, Dojo’s power turned out to be so high that the power grid in the city of Palo Alto, California failed during a test earlier this year.

Dojo is Tesla’s custom supercomputer platform, designed from the ground up for AI machine learning (ML). However, as development continues, Dojo will be used for video training using video data from its fleet of vehicles. As a reminder, the car manufacturer already has a large supercomputer based on the NVIDIA GPU, which is one of the most powerful in the world, and a data center containing 30 Po (i.e. 30,000,000 GB) of stored images. . But the company’s new Dojo supercomputer uses chips and infrastructure designed entirely by Tesla.

With this supercomputer, the company claims to be able to replace 72 GPU racks consisting of 4,000 GPUs with just four Dojo cabinets. At the “Tesla AI Day” event last year, program officials revealed the first Dojo chip and practice tiles, which will later grow into a full Dojo cluster or “ExaPod”. On Friday, Tesla said it should be completed in the first quarter of 2023. The company plans to build seven in total in Palo Alto. In a 10-cabinet system, Tesla said the ExaPod Dojo would break through the compute ExaFlop barrier.

The system contains 1.3 TB of high-speed SRAM and 13 TB of high-bandwidth DRAM. Since 2021, the development of Dojo has reached some important milestones, including the installation of the first Dojo cabinet, 2.2 MW load tests, and the company announced that it is now working at a rate of construction of a tile per day. Bill Chang, Tesla’s Principal Systems Engineer for Dojo, said, “We knew we had to re-examine every aspect of the data center infrastructure to support our unprecedented cooling and power density.”

The team had to develop their own high-power cooling and power supply system to power the Dojo cabinets. This is roughly the current state of development of the Dojo platform. Chang said Friday that Tesla blew up its local power grid substation when it tested the infrastructure earlier this year. At the start of the year, we started testing the load on our power and cooling infrastructure and were able to push it beyond 2 MW before we caused our substation to fail and receive a call from the city, he said.

On Friday, Tesla also ran the “Stable Diffusion” AI model using 25 Dojo matrices, creating an AI-generated image from the prompt: “Cybertruck on Mars”. Tesla further used the event to try to recruit more talent, and said the team was on schedule to complete the first ExaPod. But why is a car manufacturer developing the most powerful supercomputer in the world? Well, Tesla would tell you that it’s not just an automaker, but a technology company that develops products to accelerate the transition to a sustainable economy.

Tesla CEO Elon Musk said it made sense to offer Dojo as a service, perhaps to take on Jeff Bezos’ AWS or other cloud computing and high-performance computing (HPC) players. and described it as a “service you can use that’s available online, where you can train your models much faster and for less money.” But more specifically, Tesla needs Dojo to automatically tag training videos for its fleet of electric vehicles and train its neural networks to build its autonomous driving system, Autopilot.

Indeed, Dojo is a game changer, as it was developed entirely in-house, from the ground up. This internal initiative contrasts with the internal initiatives of competing OEMs who rely on multiple Tier 1 suppliers to meet their diverse autonomous driving needs. Tech giants like Google, with its Tensor processing unit, are taking a similar approach to developing custom AI chips. Tesla could go down the same route, since previously the company sourced processors from Nvidia for its vehicles.

But he decided to develop his own chips. He was motivated by the fact that Nvidia’s chips are more expensive and offer significantly lower performance than Tesla’s custom unit, which is focused on making vehicles fully autonomous through cameras and visuals. Thus, Dojo is likely to be much more effective at training the AI ​​neural networks and deep learning models needed to push Tesla towards fully automated vehicles. In addition, it is expected to improve the safety of Autopilot and FSD (Full Self-Driving) programs.

Tesla uses cameras to read road conditions and provide superior comfort to passengers. Additionally, the camera sensors are ready to support other features that will enhance the in-vehicle experience. Tesla’s fleet of nearly three million vehicles is expected to travel more than 100 million miles by the end of the year, which means the volume of data captured by vehicles will grow exponentially. so does the amount of computation required to process this data.

From all these points of view, Dojo could turn out to be an essential asset. Additionally, with Dojo, Tesla underlines the benefits of its unique vertical integration strategy, which allows it to control key aspects of the value chain, from manufacturing to sales, hardware to software, products to services. The Dojo supercomputer could enable Tesla to move towards fully autonomous vehicles and could also bolster the capabilities of the company’s AI-powered humanoid robot Optimus, which was also shown on Friday.

And you?

What is your opinion on the subject?
What do you think of Tesla’s new supercomputer?
What do you think could be the benefits of Dojo for Tesla?

See as well

Germany will host Jupiter, Europe’s first publicly known exascale supercomputer, according to an announcement from EuroHPC JU

Voyager, the first-ever supercomputer powered by Intel’s generic AI chips, will enable scientists to optimize algorithms for machine learning models

LUMI, Europe’s newest and most powerful supercomputer, is set to solve global problems and drive a green transformation

Atos announces the new BullSequana XH3000 exascale-class supercomputer, with six times the power of the previous generation, available in the fourth quarter of 2022

Leave a Comment