Goal-Conditioned RL (z-Conditioning)

Condition policies on strategy descriptors to enable rapid specialization. Used by ROA-Star and SCC for exploiter training.

API

use rl4burn::algo::z_conditioning::{ZConditioning, ZConditioningConfig, z_reward};

let z_mod = ZConditioningConfig::new(16, obs_dim).init(&device);
// z_dim=16 (strategy embedding), obs_dim from environment

let conditioned_obs = z_mod.forward(obs, z);
// conditioned_obs: [batch, obs_dim + 64] — ready for policy network

// Pseudo-reward for following target strategy
let reward = z_reward(&observed_stats, &target_z);
// negative L2 distance: closer to target = higher reward

What is z?

A low-dimensional vector describing a play style, computed from human replay statistics. Examples:

Aggressive: high damage, low farming
Defensive: low damage, high survival
Rush: high early-game activity

By conditioning on different z vectors, the same policy can exhibit different strategies.

Keyboard shortcuts

rl4burn

Goal-Conditioned RL (z-Conditioning)

API

What is z?