BTR options (IQN + paper extras)
This page describes how BTR is implemented in this project.
The most important point:
BTR is not a separate architecture here.
It is IQN (same IQN_Network and trainer path) plus a set of optional
enhancements configured in the btr section of the config.
Baseline and composition
Baseline: IQN architecture
BTR in this repo:
IQN + {Munchausen, IMPALA-CNN, AdaptiveMaxPool, SpectralNorm, LayerNorm, NoisyLinear}
![digraph btr_stack {
rankdir=LR;
node [shape=box, fontname="Helvetica", fontsize=10];
iqn [label="IQN baseline\n(image head + float head + IQN + dueling)", style="filled", fillcolor=lightyellow];
m [label="Munchausen target"];
i [label="IMPALA-CNN image head"];
p [label="Adaptive MaxPool"];
s [label="SpectralNorm (conv)"];
l [label="LayerNorm (MLP/heads)"];
n [label="NoisyLinear exploration"];
out [label="BTR-configured IQN", style="filled", fillcolor=lightgreen];
iqn -> out;
m -> out;
i -> out;
p -> out;
s -> out;
l -> out;
n -> out;
}](../_images/graphviz-16bec5321f89799dfe55a5b159cb49906ceb8e1a.png)
Where each BTR option is applied
The table below maps each BTR feature to the implementation location and effect.
BTR feature |
Where in code |
What changes |
|---|---|---|
Munchausen IQN |
|
Replaces hard-max/DDQN bootstrapped target with soft-policy value and log-policy reward bonus. |
IMPALA-CNN |
|
Swaps default 4-conv image encoder with IMPALA residual blocks. |
Adaptive MaxPool |
image head builder in |
Produces fixed-size spatial output before flatten. |
SpectralNorm |
image head builder in |
Wraps convolution layers with spectral normalization. |
LayerNorm |
float extractor and dueling heads in |
Adds layer normalization in MLP/heads. |
NoisyLinear |
|
Uses trainable parameter noise; when enabled, rollout action logic does not use epsilon/Boltzmann branches. |
Config resolution in code
Vision CNN (IMPALA, adaptive pool, spectral norm): YAML canonical block is
nn.vis.cnn; omitted keys can be filled frombtr:at load (config_loader._merge_btr_cnn_into_vis). All call sites that build_build_img_head(classic IQN, PPO CNN, multimodal CNN branch, BC, pretrain) resolve kwargs throughtrackmania_rl/nn_build/vis_cnn_head.py.LayerNorm / NoisyNet / ``noisy_sigma0`` on IQN MLP heads (classic and shared-backbone IQN): read from the flat loaded config in
trackmania_rl/nn_build/iqn_btr_from_config.pyasiqn_btr_mlp_head_kw_from_config.
BTR data flow vs baseline IQN
At high level, BTR uses the same collector/learner/replay pipeline as IQN. Differences are in the model blocks and target computation:
![digraph btr_flow {
rankdir=TB;
node [shape=box, fontname="Helvetica", fontsize=10];
s [label="State: image + float", style="filled", fillcolor=lightblue];
enc [label="Encoder\n(default CNN or IMPALA + SN + AdaptivePool)"];
body [label="IQN quantile fusion\n(+ optional LayerNorm in MLP/heads)"];
heads [label="Dueling heads\n(+ optional NoisyLinear)"];
q [label="Q quantiles"];
tgt [label="Target computation\n(Standard IQN/DDQN OR Munchausen IQN)", style="filled", fillcolor=mistyrose];
loss [label="Quantile Huber loss"];
s -> enc -> body -> heads -> q -> tgt -> loss;
}](../_images/graphviz-b96678666eb9f57ace3f3f2805dea357475900f0.png)
Detailed behavior by component
1) Munchausen IQN target
With btr.use_munchausen: true, training uses the soft-policy target path:
compute
log π(a|s)from quantile-mean Q and temperaturemunchausen_entropy_tau;add reward bonus
alpha * tau * clamp(log π(a_t|s_t), lo, 0);bootstrap with soft value
V(s') = Σ_a π(a|s') [Q(s',a) - tau*log π(a|s')].
This path is implemented for both single-action and multi-action modes. If Munchausen is off, code falls back to standard DDQN/max target logic.
Why this is useful:
it replaces brittle max(Q) style targets with a soft-policy target, which
often reduces optimistic spikes and makes updates less jumpy. The bounded
log-policy bonus also keeps learning signal informative when many actions have
similar value.
2) IMPALA-CNN + Adaptive MaxPool + SpectralNorm
These three options modify only the image branch:
nn.vis.cnn.use_impala_cnnselects residual IMPALA-style encoder.nn.vis.cnn.use_adaptive_maxpoolchanges spatial reduction to fixed size.nn.vis.cnn.use_spectral_normwraps conv layers for spectral normalization.
The rest of IQN pipeline (float branch, quantile fusion, dueling heads, replay/training loop) remains unchanged.
Why this is useful: IMPALA usually gives a stronger visual encoder than a small plain conv stack. Adaptive max-pool makes spatial output size fixed and less sensitive to raw resolution details. Spectral norm limits sudden activation amplification, which often improves stability when targets are noisy.
3) LayerNorm in MLP and heads
With use_layer_norm, layer normalization is inserted in:
float feature extractor;
advantage/value head MLP blocks.
This is a stabilization feature and does not change tensor contracts.
Why this is useful: LayerNorm reduces hidden-scale drift across training and usually makes optimization smoother, especially when image and float branches have different feature scales.
4) NoisyLinear and exploration semantics
When use_noisy_linear is enabled:
linear layers in dueling heads become factorized noisy layers;
rollout policy calls
reset_noise()in exploration mode anddisable_noise()in eval mode;epsilon/Boltzmann branches in action selection are bypassed.
So in this mode exploration is driven by parameter noise in Q-values. The epsilon schedules can still be present in config/logging, but they are not used to choose actions in the noisy branch.
Why this is useful: exploration becomes state-dependent (through noisy parameters) instead of uniform random action perturbations. In long runs this often preserves useful exploration better than a fixed epsilon schedule.
![digraph noisy_action {
rankdir=LR;
node [shape=box, fontname="Helvetica", fontsize=10];
cfg [label="use_noisy_linear?"];
noisy [label="reset_noise / disable_noise\nargmax(noisy Q)"];
eps [label="epsilon-greedy or Boltzmann\n(only when noisy off)"];
act [label="chosen action/block", style="filled", fillcolor=lightgreen];
cfg -> noisy [label="yes"];
cfg -> eps [label="no"];
noisy -> act;
eps -> act;
}](../_images/graphviz-095165336ae967c79adac29f9c7f81a0a905d45a.png)
Configuration section
Vision CNN (canonical): nn.vis.cnn — use_impala_cnn, impala_model_size,
use_adaptive_maxpool, adaptive_maxpool_size, use_spectral_norm.
The loader can copy missing CNN keys from btr: into nn.vis.cnn for backward-compatible minimal YAML; IQN and PPO Variant A read the merged nn.vis.cnn. Multimodal fusion post_concat still uses its own fixed CNN in multimodal_torch_fusion.py.
BTR-only flags (under btr: in YAML, BTRConfig in code):
use_munchausen,munchausen_alpha,munchausen_entropy_tau,munchausen_louse_layer_normuse_noisy_linear,noisy_sigma0
BTRConfig still lists the CNN fields for schema/merge; prefer setting them on nn.vis.cnn in new configs to avoid duplication.
Practical recommendations
Start from IQN defaults and enable BTR features incrementally if you need isolated ablations.
For full BTR-style runs, enable all six features together.
Keep in mind that some “paper defaults” are environment-specific; for TrackMania, schedule timing, gamma strategy, and batch size may need retuning.
See also
IQN architecture — baseline model that BTR augments.
IQN model experiments — IQN experiment pages.
Configuration Guide — full config reference; YAML trees Neural network YAML (nn) — full reference and BTR block (btr:).