Sunday, March 05, 2023

On "Mastering Diverse Domains through World Models" collecting diamonds in Minecraft

Recommendable! Soon reinforcement learning algorithms will beat the best human players in the game of Minecraft!

First it was Backgammon in the early 1990s, then Chess in 1997 ...

What is particularly impressive, the researchers used the same model with fixed hyperparameters across different challenging tasks beating most of the previous state of the art models.

One of the simplest and very effective new features introduced in DreamerV3 is Symlog for inputs and predictions of returns to deal with rare or extremely large values:
"... symlog(x) = sign(x) ln |x| + 1 ... The symlog function compresses the magnitudes of both large positive and negative values. Unlike the logarithm, it is symmetric around the origin while preserving the input sign. ..."

From the abstract:
"General intelligence requires solving tasks across many domains. Current reinforcement learning algorithms carry this potential but are held back by the resources and knowledge required to tune them for new tasks. We present DreamerV3, a general and scalable algorithm based on world models that outperforms previous approaches across a wide range of domains with fixed hyperparameters. These domains include continuous and discrete actions, visual and low-dimensional inputs, 2D and 3D worlds, different data budgets, reward frequencies, and reward scales. We observe favorable scaling properties of DreamerV3, with larger models directly translating to higher data-efficiency and final performance. Applied out of the box, DreamerV3 is the first algorithm to collect diamonds in Minecraft from scratch without human data or curricula, a long-standing challenge in artificial intelligence. Our general algorithm makes reinforcement learning broadly applicable and allows scaling to hard decision-making problems."

[2301.04104] Mastering Diverse Domains through World Models

No comments: