AI
AI

Google DeepMind Unveils Two Gemini-Driven Models to Apply AI in Real-World Scenarios

Photo credit: www.therobotreport.com

Google’s robotics team applies expertise in machine learning, engineering, and physics simulation to address challenges facing the development of AI-powered robots. | Source: DeepMind

Google DeepMind has unveiled two innovative artificial intelligence models: Gemini Robotics, which builds on its Gemini 2.0 framework, and Gemini Robotics-ER, a model focused on enhancing spatial comprehension.

DeepMind has made strides in equipping its Gemini models to tackle intricate challenges using multimodal reasoning across various forms of input, such as text, images, audio, and video. These new models signify a shift from theoretical applications to practical, real-world robotics integration.

Gemini Robotics stands out as a sophisticated vision-language-action (VLA) model, incorporating the ability to execute physical actions to directly control robotic systems.

Conversely, Gemini Robotics-ER introduces an advanced spatial reasoning capability, facilitating programmers in utilizing Gemini’s embodied reasoning skills for their specific robotic applications.

DeepMind asserts that both models will empower a diverse range of robots to execute an unprecedented array of real-life tasks. To further this initiative, the company has formed a partnership with Apptronik to develop humanoid robots that leverage Gemini 2.0 technology.

Additionally, DeepMind is collaborating with select organizations—such as Agile Robots, Agility Robotics, Boston Dynamics, and Enchanted Tools—to shape the future development of Gemini Robotics-ER through collaborative testing.

Transforming AI into Tangible Solutions

In a recent blog post, DeepMind highlighted that to be truly beneficial, AI models in robotics must embody three essential characteristics:

  • Generality: The ability to adapt to various scenarios.
  • Interactivity: The capacity to quickly comprehend and react to commands or environmental changes.
  • Dexterity: The skill to perform tasks that require fine motor control, similar to human manipulation of objects.

Though prior initiatives demonstrated progress in these areas, the introduction of Gemini Robotics marks a significant advancement across all three dimensions.

Focusing on Generality and Interactivity

Gemini Robotics leverages the understanding embedded within Gemini to generalize and approach new challenges, seamlessly handling tasks that it has not encountered during its training phase. It is specifically designed to recognize new objects, follow varied instructions, and operate effectively within unfamiliar environments, according to Google.

In comparative assessments, Gemini Robotics has reportedly surpassed other VLA models, exhibiting over double the performance on a comprehensive generalization benchmark.

Interactivity is also critical for functionality in dynamic, real-world contexts. Robots must engage with humans and their surroundings and adapt to changes fluidly. DeepMind claims that the interactive capabilities rooted in the Gemini 2.0 framework allow Gemini Robotics to engage intuitively with users, processing commands articulated in everyday language and multiple dialects.

The model’s capacity for understanding a wider spectrum of natural language instructions enables it to adjust its actions based on user feedback. It continually surveys its environment to detect shifts and alter its response accordingly, facilitating improved collaboration with human partners across various settings.

Emphasizing Dexterity in Robotics

DeepMind identifies dexterity as the third critical element for effective robotics. Numerous tasks that humans accomplish with ease involve complex motor skills that remain challenging for robotic systems. In contrast, Gemini Robotics is capable of executing intricate multi-step tasks that demand precise control, such as folding origami or organizing snacks within a bag.

The model has also been designed to accommodate various robotic platforms. Although it was mainly trained using data from the bi-arm robotic system ALOHA 2, demonstrations have successfully showcased its compatibility with two-armed systems like the Franka arms common in research facilities.

Moreover, Gemini Robotics can be tailored for more sophisticated forms, such as the humanoid Apollo robot from Apptronik, with a focus on accomplishing practical tasks.

Enhancing Spatial Reasoning with Gemini Robotics-ER

Gemini Robotics-ER builds on Gemini’s capabilities by refining its ability to understand spatial relationships, a critical aspect for operational robotics. This model also allows developers to link it with pre-existing low-level controllers, enhancing functionality. DeepMind asserts that this model brings significant improvements over Gemini 2.0, particularly in tasks involving pointing and three-dimensional detection.

By merging spatial reasoning with coding capabilities, Gemini Robotics-ER can generate completely new functionalities dynamically. For instance, when presented with a coffee mug, it can naturally foresee an appropriate grasping method and a safe path for retrieval.

In an all-inclusive operational setup, Gemini Robotics-ER reportedly achieves two to three times the success rate of its predecessor, Gemini 2.0, in performing foundational robot control functions—encompassing perception, state estimation, spatial understanding, planning, and code generation.

When basic coding does not suffice, the model employs in-context learning, following a few human demonstrations to derive solutions to tasks.

Prioritizing Robotics Safety

As DeepMind explores AI and robotics, it emphasizes a comprehensive, multi-layered strategy to safety, addressing everything from fundamental motor control to complex semantic understanding.

Gemini Robotics-ER is built to communicate with safety-critical low-level controllers, handling functions such as collision avoidance, regulating contact forces, and ensuring stable operation for mobile robots. The organization integrates core safety mechanisms to equip Gemini Robotics-ER with the capacity to evaluate whether particular actions are safe and to define suitable responses accordingly.

Advancing Safety Research with New Datasets

To bolster research on safety within robotics across academic and industrial spectrums, DeepMind has introduced a novel dataset aimed at assessing and improving semantic safety in embodied AI and robotics systems. Previous initiatives have demonstrated the potential effectiveness of a “Robot Constitution,” inspired by Isaac Asimov’s famed Three Laws of Robotics, to guide machine learning models in selecting safer operational tasks.

DeepMind has since devised a framework for automatically generating data-driven constitutions—rules articulated in natural language that can refine robotic behavior. This system would empower developers to create, adjust, and implement constitutions, promoting the development of safer robots that align closely with human ethics and values.

Finally, the newly released ASIMOV dataset will allow researchers to rigorously evaluate the safety consequences of robotic actions within real-world contexts, further enhancing the safety landscape in robotics.

Source
www.therobotreport.com

Related by category

Epson Introduces GX-C Series Featuring RC800A Controller in Its Robot Lineup

Photo credit: www.therobotreport.com Epson Robots, recognized as the leading SCARA...

Glacier Secures $16M in Funding and Unveils New Recology King Deployment

Photo credit: www.therobotreport.com Two Glacier systems at work in an...

Novanta Unveils Cutting-Edge Motion Control Products at Robotics Summit

Photo credit: www.therobotreport.com BEDFORD, Mass. – Celera Motion, a segment...

Latest news

Yum Brands (YUM) First Quarter Earnings Report for 2025

Photo credit: www.cnbc.com Yum Brands Reports Mixed Quarter as Pizza...

Devin Haney vs. Jose Ramirez: Betting Odds, Selections, and Predictions

Photo credit: www.forbes.com The eagerly awaited boxing event in Times...

3 Reasons I Continue to Invest in the Vanguard S&P 500 ETF

Photo credit: www.fool.com The recent declines in the market have...

Breaking news