© Iv-olga/Shutterstock.com
Google’s Gemini AI models can already answer our questions, help us organize ourselves, write documents, or even code applications. But in the not-so-distant future, Gemini could also… drive vehicles. In any case, this is the new avenue being explored by Waymo, the subsidiary of Alphabet (Google’s parent company) specializing in autonomous vehicles and robotaxis.
Today, Waymo is the leader in its field. The Alphabet subsidiary already offers a competitor to Uber that operates autonomous cars in a few American cities, and which carries out more than 150,000 trips per week. And while Waymo is happy with the technologies it currently uses, it is now exploring the possibility of improving its autonomous vehicles by using Gemini's intelligence.
In a recent publication, Waymo presents a scientific article in which it describes a new technology called End-to-End Multimodal Model for Autonomous Driving. “Powered by Gemini, a large multimodal language model developed by Google, EMMA uses a unified end-to-end trained model to generate future autonomous vehicle trajectories directly from sensor data. Trained and optimized specifically for autonomous driving, EMMA leverages Gemini’s vast global knowledge to better understand complex scenarios on the road,” Waymo’s statement reads.
Why use Gemini ?
Waymo’s current approach relies on multiple independent modules to perform the various tasks of autonomous driving. The advantage of this system is that it makes it easier to debug and optimize each module separately. However, it has a scalability problem. And this system would have difficulty adapting to new environments, because it is optimized for targeted scenarios.
200% Deposit Bonus up to €3,000 180% First Deposit Bonus up to $20,000The use of large multimodal language models (which understand both text and images) could solve this scalability problem. “Indeed, MLLMs, as general-purpose baseline models, excel in two key areas: (1) they are trained on large, internet-scale datasets that provide rich “world knowledge” beyond what is contained in common driving logs, and (2) they demonstrate superior reasoning capabilities through techniques such as thought chain reasoning,”, Waymo’s paper reads.
Challenges Ahead
But for now, while the potential for using generative AI in self-driving cars is huge, Waymo believes there are still significant challenges ahead. For example, Waymo’s EMMA system still has limitations in its ability to process video. Additionally, it only understands images, not data from more complex sensors, such as LiDAR sensors.
“While EMMA is showing promising results, it is still in its early stages with challenges and limitations in onboard deployment, spatial reasoning capability, interpretability, and closed-loop simulation. Despite this, we believe our EMMA findings will inspire further research and advancements in this field,” the Waymo paper says.
- Gemini can already summarize emails, answer questions, and generate computer code
- But this could later be used by the driving systems of autonomous cars
- Waymo, the robotaxis specialist, has imagined a new system based on Gemini to manage autonomous driving
- But for the moment, the work is only in its early stages, because, while this system has enormous potential, it also has significant limitations that will first have to be eliminated
📍 To not miss any Presse-citron news, follow us on Google News and WhatsApp.
[ ]