Transformer Encoder
Reusable multi-head self-attention blocks for entity processing. Used by ROA-Star and SCC to encode sets of game units.
Multi-Head Attention
use rl4burn::{MultiHeadAttention, MultiHeadAttentionConfig};
let attn = MultiHeadAttentionConfig::new(128, 4).init(&device);
// d_model=128, 4 heads (d_k = 32 per head)
let output = attn.forward(query, key, value, None);
// All inputs: [batch, seq_len, 128]
// Optional mask: [batch, seq_len] (true = attend, false = ignore)
Transformer Block
Pre-norm residual block: self-attention + feedforward.
use rl4burn::{TransformerBlock, TransformerBlockConfig};
let block = TransformerBlockConfig::new(128, 4, 512).init(&device);
// d_model=128, 4 heads, d_ff=512
let output = block.forward(input, None); // residual: output ≈ input + attention + ffn
Stacked Encoder
use rl4burn::{TransformerEncoder, TransformerEncoderConfig};
let encoder = TransformerEncoderConfig::new(128, 4, 2, 512).init(&device);
// 2 layers of transformer blocks
let encoded = encoder.forward(entities, None);
Properties
- Permutation equivariant: reordering input tokens reorders output tokens identically (no positional encoding).
- Variable-length: use masking for padded sequences.
- For 30 entities with 128-dim embeddings, a 2-layer encoder runs in microseconds on CPU.