Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Goal-Conditioned RL (z-Conditioning)

Condition policies on strategy descriptors to enable rapid specialization. Used by ROA-Star and SCC for exploiter training.

API

use rl4burn::algo::z_conditioning::{ZConditioning, ZConditioningConfig, z_reward};

let z_mod = ZConditioningConfig::new(16, obs_dim).init(&device);
// z_dim=16 (strategy embedding), obs_dim from environment

let conditioned_obs = z_mod.forward(obs, z);
// conditioned_obs: [batch, obs_dim + 64] — ready for policy network

// Pseudo-reward for following target strategy
let reward = z_reward(&observed_stats, &target_z);
// negative L2 distance: closer to target = higher reward

What is z?

A low-dimensional vector describing a play style, computed from human replay statistics. Examples:

  • Aggressive: high damage, low farming
  • Defensive: low damage, high survival
  • Rush: high early-game activity

By conditioning on different z vectors, the same policy can exhibit different strategies.