Just read this blog by UC Berkeley written by two students of Pieter Abbeel! I have read a number of these blog posts in the past. They are usually of high quality and insightful into the latest research subjects.
I wonder if it is not easier to teach a robot not knock things over when trying to go from point A to point B. Or when the robot walks on two legs/hands that the head should not touch the ground.
This one, I found a little silly! The paper behind this blog post was recently presented at the highly regarded International Conference on Learning Representations (ICLR) 2021. Here is one example:
"... When the robot is deployed, Alice asks it to navigate to the purple door. If we were to encode this as a reward function that only rewards the robot while it is at the purple door, the robot would take the shortest path to the purple door, knocking over and breaking the vase – since no one said it shouldn’t do that. The robot is perfectly aware that its plan causes it to break the vase, but by default it doesn’t realize that it shouldn’t break the vase.
RLSP can instead infer that the vase should not be broken. At a high level, it effectively considers all the ways that the past could have been, checks which ones are consistent with the observed state, and infers a reward function based on the result. If Alice didn’t care about whether the vase was broken, she would have probably broken it some time in the past. If she wanted the vase broken, she definitely would have broken it some time in the past. ..."
No comments:
Post a Comment