DreamerV3 Overview
DreamerV3 learns a model of the world, then trains a policy entirely inside imagined trajectories. It’s architecturally different from the model-free papers (AlphaStar, JueWu) but its sample efficiency could be transformative for fast simulations.
The DreamerV3 training loop
repeat:
1. Collect experience in the real environment
2. Store in sequence replay buffer
3. Sample sequences, train the world model (RSSM)
4. Imagine trajectories from the world model
5. Train actor-critic on imagined data
Steps 4-5 are “free” — no environment interaction needed.
rl4burn modules for DreamerV3
| Component | Module | Page |
|---|---|---|
| World model | Rssm | RSSM |
| Imagination | imagine_rollout | Imagination |
| Value targets | lambda_returns | Imagination |
| Replay | SequenceReplayBuffer | Sequence Replay |
| Transforms | symlog, TwohotEncoder | Symlog |
| KL training | kl_balanced_loss | KL Balance |
| Normalization | PercentileNormalizer | Percentile |
| Block GRU | BlockGruCell | RNN |