Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

DreamerV3 Overview

DreamerV3 learns a model of the world, then trains a policy entirely inside imagined trajectories. It’s architecturally different from the model-free papers (AlphaStar, JueWu) but its sample efficiency could be transformative for fast simulations.

The DreamerV3 training loop

repeat:
    1. Collect experience in the real environment
    2. Store in sequence replay buffer
    3. Sample sequences, train the world model (RSSM)
    4. Imagine trajectories from the world model
    5. Train actor-critic on imagined data

Steps 4-5 are “free” — no environment interaction needed.

rl4burn modules for DreamerV3

ComponentModulePage
World modelRssmRSSM
Imaginationimagine_rolloutImagination
Value targetslambda_returnsImagination
ReplaySequenceReplayBufferSequence Replay
Transformssymlog, TwohotEncoderSymlog
KL trainingkl_balanced_lossKL Balance
NormalizationPercentileNormalizerPercentile
Block GRUBlockGruCellRNN