rl4burn ships with 15 runnable examples in the examples/ directory, organized into five tiers of increasing complexity. Each example is a standalone Cargo package that you can run with cargo run -p <name> --release.
Use this decision guide to pick the right starting point:
Scenario
Recommended algorithm
Start from example
Discrete actions (e.g., CartPole, Atari)
PPO or DQN
quickstart
Continuous actions (e.g., Pendulum, MuJoCo)
PPO with Gaussian policy
ppo-continuous
Multi-discrete actions (e.g., RTS games)
PPO with multi-head
ppo-multi-discrete
Invalid actions vary per step
Masked PPO
action-masking
Competitive game (1v1 or teams)
Self-play PPO
self-play
Partial observability
LSTM policy + PPO
lstm-policy
Multiple cooperating agents
Shared-weight PPO
multi-agent
Large observation space / model-based
DreamerV3 (future)
—
When in doubt, start with PPO (quickstart). It is the most versatile algorithm and works well across a wide range of problems. Switch to DQN only if you need off-policy learning or have a small discrete action space where sample efficiency matters.