This seems to be very promising! I did not have time to look further into it.
"Google’s DeepMind has announced Robotics Transformer 2 (RT-2), a first-of-its-kind vision-language-action (VLA) model that can enable robots to perform novel tasks without specific training. ...
Back in 2022, [Google] debuted RT-1, a multi-task model that trained on 130,000 demonstrations and enabled Everyday Robots to perform 700-plus tasks with a 97% success rate. Now, using the robotic demonstration data from RT-1 with web datasets, the company has trained the successor of the model: RT-2. ...
Back in 2022, [Google] debuted RT-1, a multi-task model that trained on 130,000 demonstrations and enabled Everyday Robots to perform 700-plus tasks with a 97% success rate. Now, using the robotic demonstration data from RT-1 with web datasets, the company has trained the successor of the model: RT-2. ...
The biggest highlight of RT-2 is that, unlike RT-1 and other models, it does not require hundreds of thousands of data points to get a robot to work. ... Organizations have long found specific robot training (covering every single object, environment and situation) critical to handling complex, abstract tasks in highly variable environments.
However, in this case, RT-2 learns from a small amount of robotic data to perform the complex reasoning seen in foundation models and transfer the knowledge acquired to direct robotic actions – even for tasks it’s never seen or been trained to do before.
“RT-2 shows improved generalization capabilities and semantic and visual understanding beyond the robotic data it was exposed to,” Google explains. This includes interpreting new commands and responding to user commands by performing rudimentary reasoning, such as reasoning about object categories or high-level descriptions.” ..."
No comments:
Post a Comment