Monday, November 07, 2022

On DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics

Just finished this interesting, short new paper! The challenges for robotics are still plenty and enormous even for fairly simple tasks like decently arranging typical cutlery around a plate on a table top. The abstract is kind of glossing over the varied and difficult challenges explained in the paper.

Summa summarum: Progress is very slow! It may take another 3-5 years before we will have capable e.g. kitchen robots that can handle a wide range of common tasks at this pace or unless new and different approaches are found.

From the abstract:
"We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally physically arranging the objects according to that image. The significance is that we achieve this zero-shot using DALL-E, without needing any further data collection or training. Encouraging real-world results with human studies show that this is an exciting direction for the future of web-scale robot learning algorithms. We also propose a list of recommendations to the text-to-image community, to align further developments of these models with applications to robotics. ..."

[2210.02438] DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics

No comments: