CSPL (Curriculum Self-Play Learning)

JueWu’s 3-phase training pipeline for scaling to many heroes/unit types.

The problem

Training a single policy that handles 40+ heroes in all combinations doesn’t converge (480+ hours without success).

The solution: three phases

Phase	What	Duration
1. Specialists	Train small models on fixed team compositions	~72h
2. Distillation	Merge all specialists into one big model	~48h
3. Generalization	Continue RL with random compositions	~144h

API

use rl4burn::{CsplPipeline, CsplConfig, CsplPhase};

let mut pipeline = CsplPipeline::new(CsplConfig {
    phase1_steps: 100_000,
    phase2_steps: 50_000,
    phase3_steps: 200_000,
    n_specialists: 10,
});

loop {
    let phase_changed = pipeline.step();

    match pipeline.current_phase() {
        CsplPhase::SpecialistTraining => { /* train specialists via self-play */ }
        CsplPhase::Distillation => { /* distill into student */ }
        CsplPhase::Generalization => { /* continue RL with random compositions */ }
    }

    if pipeline.is_complete() { break; }
}

Keyboard shortcuts

rl4burn

CSPL (Curriculum Self-Play Learning)

The problem

The solution: three phases

API