Loss Functions

Backend-generic loss functions for RL training. All return Tensor<B, 1> with shape [1] (scalar), compatible with .backward().

Value loss (Huber)

pub fn value_loss<B: Backend>(pred: Tensor<B, 1>, target: Tensor<B, 1>) -> Tensor<B, 1>

Smooth L1 (Huber) loss with δ=1.0. Quadratic for small errors, linear for large errors. Prevents outlier targets from dominating the value head update.

Discrete policy loss (REINFORCE)

pub fn policy_loss_discrete<B: Backend>(
    logits: Tensor<B, 2>,       // [batch, n_actions]
    actions: Tensor<B, 2, Int>, // [batch, 1] action indices
    mask: Tensor<B, 2>,         // [batch, n_actions] valid=1.0
    advantage: Tensor<B, 1>,    // [batch]
) -> Tensor<B, 1>

Standard REINFORCE: -mean(advantage * log_prob(action)). Supports action masking for environments with invalid actions.

Continuous policy loss

pub fn policy_loss_continuous<B: Backend>(
    pred: Tensor<B, 2>,      // [batch, action_dim]
    target: Tensor<B, 2>,    // [batch, action_dim]
    advantage: Tensor<B, 1>, // [batch]
) -> Tensor<B, 1>

Advantage-weighted regression for deterministic continuous actions. Only positive advantages contribute gradient (negative advantage + MSE is degenerate).

These loss functions are standalone building blocks. PPO and DQN implement their own loss computation internally (clipped surrogate for PPO, Bellman MSE for DQN). Use these when building custom algorithms.

rl4burn

Loss Functions

Value loss (Huber)

Discrete policy loss (REINFORCE)

Continuous policy loss

Note

Keyboard shortcuts

rl4burn

Loss Functions

Value loss (Huber)

Discrete policy loss (REINFORCE)

Continuous policy loss

Note