DreamerV3 Overview

DreamerV3 learns a model of the world, then trains a policy entirely inside imagined trajectories. It’s architecturally different from the model-free papers (AlphaStar, JueWu) but its sample efficiency could be transformative for fast simulations.

The DreamerV3 training loop

repeat:
    1. Collect experience in the real environment
    2. Store in sequence replay buffer
    3. Sample sequences, train the world model (RSSM)
    4. Imagine trajectories from the world model
    5. Train actor-critic on imagined data

Steps 4-5 are “free” — no environment interaction needed.

rl4burn modules for DreamerV3

Component	Module	Page
World model	`Rssm`	RSSM
Imagination	`imagine_rollout`	Imagination
Value targets	`lambda_returns`	Imagination
Replay	`SequenceReplayBuffer`	Sequence Replay
Transforms	`symlog`, `TwohotEncoder`	Symlog
KL training	`kl_balanced_loss`	KL Balance
Normalization	`PercentileNormalizer`	Percentile
Block GRU	`BlockGruCell`	RNN

Keyboard shortcuts

rl4burn

DreamerV3 Overview

The DreamerV3 training loop

rl4burn modules for DreamerV3