Just finished this interesting, short new paper! The challenges for robotics are still plenty and enormous even for fairly simple tasks like decently arranging typical cutlery around a plate on a table top. The abstract is kind of glossing over the varied and difficult challenges explained in the paper.
Summa summarum: Progress is very slow! It may take another 3-5 years before we will have capable e.g. kitchen robots that can handle a wide range of common tasks at this pace or unless new and different approaches are found.
From the abstract:
"We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally physically arranging the objects according to that image. The significance is that we achieve this zero-shot using DALL-E, without needing any further data collection or training. Encouraging real-world results with human studies show that this is an exciting direction for the future of web-scale robot learning algorithms. We also propose a list of recommendations to the text-to-image community, to align further developments of these models with applications to robotics. ..."
No comments:
Post a Comment