Intrinsic Rewards

Exploration bonuses based on internal state. Useful when extrinsic rewards are sparse.

API

use rl4burn::collect::intrinsic::{IntrinsicReward, CountBasedReward, combine_rewards};

let mut explorer = CountBasedReward::new(0.1); // discretization resolution
explorer.update(&obs, action, &next_obs);
let bonus = explorer.reward(&obs, action, &next_obs);
// bonus = 1 / sqrt(visit_count)

let combined = combine_rewards(&extrinsic, &intrinsic, 0.01);
// combined[i] = extrinsic[i] + 0.01 * intrinsic[i]

Count-Based Exploration

Reward = 1 / sqrt(N(s)) where N(s) is how many times the agent has visited a discretized version of state s. Novel states get high reward; familiar states get low reward.

Entropy-Reduction Reward

ROA-Star’s scouting reward: max(H_{prev} - H_{current}, 0). Rewards the agent for reducing uncertainty about the opponent’s strategy.

use rl4burn::collect::intrinsic::EntropyReductionReward;
let mut scouting = EntropyReductionReward::new();
let reward = scouting.reward_from_entropy(current_entropy);

The IntrinsicReward trait

Implement for custom exploration strategies:

pub trait IntrinsicReward {
    type Observation;
    fn reward(&self, obs: &Self::Observation, action: usize, next_obs: &Self::Observation) -> f32;
    fn update(&mut self, obs: &Self::Observation, action: usize, next_obs: &Self::Observation);
}

Keyboard shortcuts

rl4burn

Intrinsic Rewards

API

Count-Based Exploration

Entropy-Reduction Reward

The IntrinsicReward trait