Imagination Rollouts
Generate trajectories entirely within the RSSM latent space for actor-critic training.
API
use rl4burn::algo::planning::imagination::{imagine_rollout, lambda_returns};
let trajectory = imagine_rollout(
&rssm,
initial_states,
|h, z| actor_network.forward(h, z), // actor closure
15, // horizon (DreamerV3 default)
);
// trajectory.states: [16] states (initial + 15 imagined)
// trajectory.reward_logits: [15] reward predictions
// trajectory.continue_logits: [15] continue predictions
Lambda-returns
Compute value targets from imagined rewards:
let returns = lambda_returns(
&rewards, // decoded from reward_logits
&values, // critic predictions at each state
&continues, // sigmoid(continue_logits)
0.997, // gamma
0.95, // lambda
);
Stop-gradient rules
During imagination training:
- World model: frozen (no gradients). The actor learns to generate actions that lead to high-value states.
- Value targets: stop-gradiented. The critic trains on fixed targets.
- Rewards: gradients flow through the dynamics model to the actor (the actor is indirectly optimizing for states that the world model predicts will be rewarding).