top of page

Are Natural Language capable Personal Robot Assistants the Future of Google's Capabilities?

Updated: Jan 7

“The Jetsons” by “Hannah-Barbara” was always a really fascinating cartoon to watch in terms of how it imagined daily life would be like in the far future. It had everything from Flying Cars, Food Synthesizers, Holograms, Spaceships and Jetpacks. However, one of the most endearing pieces of technology imagined in the show was Rosie the robotic maid. Name any household chore and she could do it with flair, precision and care.

Cartoon “The Jetsons” by “Hannah-Barbara”

"The Jetsons" Image Source: Comic-Mint


Imagine if just like Rosie, we could all have our personal helper robots at home to whom we could simply say “Please set the dinner table”, “Please clean my desk” or even “Please make me a Cheeseburger”. These tasks, although straightforward for humans, require a high-level understanding of the world for robots.


Google Deepmind robot

Google Deepmind robot. Source: Google

 

On 4th January 2024, the Google DeepMind Robotics Team announced a suite of advances in robotics research that bring us a step closer to this future. AutoRT, SARA-RT, and RT-Trajectory build on their historic Robotics Transformers work to help robots make decisions faster, and better understand and navigate their environments.

 

AutoRT: Harnessing large models to better train robots


AutoRT is a system that utilizes large foundation models to train robots for real-world tasks and practical human goals. It combines a Visual Language Model (VLM) and a Large Language Model (LLM) with a robot control model to direct multiple robots in diverse environments. The VLM interprets the environment, while the LLM suggests and also decides which tasks the robot could carry out. Over seven months, AutoRT safely managed up to 20 robots simultaneously in various office settings, collecting 77,000 trials across 6,650 unique tasks.


Google Deepmind robot flowchart. Exploration, Describe scene using VLM, Generate and filter task using LLM, Action by actuators

Source : Google

(1) An autonomous wheeled robot finds a location with multiple objects.

(2) A VLM describes the scene and objects to an LLM.

(3) An LLM suggests diverse manipulation tasks for the robot and decides which tasks the robot could do unassisted, which would require remote control by a human, and which are impossible, before making a choice.

(4) The chosen task is attempted, the experiential data collected, and the data scored for its diversity/novelty. Repeat.


AutoRT has safety guardrails, including a Robot Constitution inspired by Asimov’s Three Laws of Robotics. It prohibits tasks involving humans, animals, sharp objects, or electrical appliances. Despite self-critiquing, safety isn’t guaranteed, so additional measures from classical robotics like automatic stop on excessive joint force and human supervision with a deactivation switch are also implemented.

 

SARA-RT: Making Robotics Transformers leaner and faster


The new Self-Adaptive Robust Attention for Robotics Transformers (SARA-RT), converts Robotics Transformer (RT) models into more efficient versions. Transformers are powerful but slow due to their high computational demands. For each doubling in the input of these models such as giving the robot an additional or higher resolution sensor, the computational resources required increases by a factor of four.

 

SARA-RT makes models more efficient using a novel method of model fine-tuning that is called “up-training”. Up-training converts the quadratic complexity to mere linear complexity, sharply reducing the computational requirements while still maintaining quality.

 

Source: Google

A sample of manipulation tasks by a Robot using SARA-RT-2.


 

Traditionally, training a robotic arm relies on mapping abstract natural language (“wipe the table”) to specific movements (close gripper, move left, move right), making it hard for models to generalize to novel tasks. In contrast, an RT-Trajectory model enables RT models to understand "how to do" tasks by interpreting specific robot motions like those contained in videos or sketches.

 

RT-Trajectory can also create trajectories by watching human demonstrations of desired tasks, and even accept hand-drawn sketches. And it can be readily adapted to different robot platforms.


Source: Google

Left: A robot, controlled by an RT model trained with a natural-language-only dataset, is stymied when given the novel task: “clean the table”. A robot controlled by RT-Trajectory, trained on the same dataset augmented by 2D trajectories, successfully plans and executes a wiping trajectory

Right: A trained RT-Trajectory model given a novel task (“clean the table”) can create 2D trajectories in a variety of ways, assisted by humans or on its own using a vision-language model.

  

These advancements indicate an exciting future where we all could have our very own Rosie. Till then, we are at the mercy of humans who perfectly understand our requests but choose to sidestep or ignore them.


Wrap-up (Powered by AI)


  1. Google's DeepMind Robotics Team announced significant advancements in robotics on January 4, 2024, bringing us closer to a future depicted in "The Jetsons."

  2. AutoRT

    1. Utilizes large foundation models for training robots in real-world tasks.

    2. Combines Visual Language Model (VLM) and Large Language Model (LLM) to interpret the environment and suggest tasks.

    3. Adheres to safety guardrails inspired by Asimov's Three Laws of Robotics.

  3. SARA-RT

    1. Converts Robotics Transformer (RT) models into more efficient versions.

    2. Addresses computational demands by using "up-training," reducing complexity while maintaining quality.

  4. RT-Trajectory

    1. Enables RT models to understand tasks by interpreting specific robot motions.

    2. Creates trajectories from human demonstrations or hand-drawn sketches.

  5. These advancements signal a promising future where personal robot assistants could understand and perform diverse household tasks.

  6. While challenges remain, such as ensuring safety and generalizing to novel tasks, this marks a significant step toward integrating robots into daily life.

  7. Until then, humans remain crucial for understanding requests that robots may eventually fulfill.



 


Like, share and follow to support us in curating the most fascinating news for you from the world of science and technology.


Stay updated | Stay fascinated | Stay ahead

bottom of page