Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

CSPL (Curriculum Self-Play Learning)

JueWu’s 3-phase training pipeline for scaling to many heroes/unit types.

The problem

Training a single policy that handles 40+ heroes in all combinations doesn’t converge (480+ hours without success).

The solution: three phases

PhaseWhatDuration
1. SpecialistsTrain small models on fixed team compositions~72h
2. DistillationMerge all specialists into one big model~48h
3. GeneralizationContinue RL with random compositions~144h

API

use rl4burn::{CsplPipeline, CsplConfig, CsplPhase};

let mut pipeline = CsplPipeline::new(CsplConfig {
    phase1_steps: 100_000,
    phase2_steps: 50_000,
    phase3_steps: 200_000,
    n_specialists: 10,
});

loop {
    let phase_changed = pipeline.step();

    match pipeline.current_phase() {
        CsplPhase::SpecialistTraining => { /* train specialists via self-play */ }
        CsplPhase::Distillation => { /* distill into student */ }
        CsplPhase::Generalization => { /* continue RL with random compositions */ }
    }

    if pipeline.is_complete() { break; }
}