Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Loss Functions

Backend-generic loss functions for RL training. All return Tensor<B, 1> with shape [1] (scalar), compatible with .backward().

Value loss (Huber)

pub fn value_loss<B: Backend>(pred: Tensor<B, 1>, target: Tensor<B, 1>) -> Tensor<B, 1>

Smooth L1 (Huber) loss with δ=1.0. Quadratic for small errors, linear for large errors. Prevents outlier targets from dominating the value head update.

Discrete policy loss (REINFORCE)

pub fn policy_loss_discrete<B: Backend>(
    logits: Tensor<B, 2>,       // [batch, n_actions]
    actions: Tensor<B, 2, Int>, // [batch, 1] action indices
    mask: Tensor<B, 2>,         // [batch, n_actions] valid=1.0
    advantage: Tensor<B, 1>,    // [batch]
) -> Tensor<B, 1>

Standard REINFORCE: -mean(advantage * log_prob(action)). Supports action masking for environments with invalid actions.

Continuous policy loss

pub fn policy_loss_continuous<B: Backend>(
    pred: Tensor<B, 2>,      // [batch, action_dim]
    target: Tensor<B, 2>,    // [batch, action_dim]
    advantage: Tensor<B, 1>, // [batch]
) -> Tensor<B, 1>

Advantage-weighted regression for deterministic continuous actions. Only positive advantages contribute gradient (negative advantage + MSE is degenerate).

Note

These loss functions are standalone building blocks. PPO and DQN implement their own loss computation internally (clipped surrogate for PPO, Bellman MSE for DQN). Use these when building custom algorithms.