Recommendable! Google (Deepmind) has made several great contributions to AI (reinforcement learning) over recent years. Google is pushing the envelope again!
"... We find the agent exhibits general, heuristic behaviours such as experimentation, behaviours that are widely applicable to many tasks rather than specialised to an individual task. This new approach marks an important step toward creating more general agents with the flexibility to adapt rapidly within constantly changing environments. ...
We then use population based training (PBT) to adjust the parameters of the dynamic task generation based on a fitness that aims to improve agents’ general capability. And finally we chain together multiple training runs so each generation of agents can bootstrap off the previous generation. ..."
We then use population based training (PBT) to adjust the parameters of the dynamic task generation based on a fitness that aims to improve agents’ general capability. And finally we chain together multiple training runs so each generation of agents can bootstrap off the previous generation. ..."
"In this work we create agents that can perform well beyond a single, individual task, that exhibit much wider generalisation of behaviour to a massive, rich space of challenges. We define a universe of tasks within an environment domain and demonstrate the ability to train agents that are generally capable across this vast space and beyond. The environment is natively multi-agent, spanning the continuum of competitive, cooperative, and independent games, which are situated within procedurally generated physical 3D worlds. ...
We show that through constructing an open-ended learning process, which dynamically changes the training task distributions and training objectives such that the agent never stops learning, we achieve consistent learning of new behaviours. The resulting agent is able to score reward in every one of our humanly solvable evaluation levels, with behaviour generalising to many held-out points in the universe of tasks. ..."
We show that through constructing an open-ended learning process, which dynamically changes the training task distributions and training objectives such that the agent never stops learning, we achieve consistent learning of new behaviours. The resulting agent is able to score reward in every one of our humanly solvable evaluation levels, with behaviour generalising to many held-out points in the universe of tasks. ..."
also published as a preprint:
No comments:
Post a Comment