IQN architecture
IQN (implicit quantile networks) is the distributional off-policy baseline: quantile Q-values, replay, target network. Implementation centers on trackmania_rl/agents/iqn.py; which module graph is built depends on nn.fusion_mode and nn.vis (see below).
For optional BTR paper features, see BTR options (IQN + paper extras). BTR is not a separate training.algorithm; it toggles behavior on top of IQN.
Configuration lives under YAML nn (Neural network YAML (nn) — full reference in Configuration Guide). RL freeze flags: RL parameter freeze.
Which network is built? (routing)
build_iqn_network_uncompiled() (used by training and BC when algorithm: iqn) picks one topology:
Multimodal fusion —
nn.fusion_modeisvision_transformer,post_concat, orunified. Builds the sameTorchMultimodalActorCriticbody as PPO withinclude_policy_heads=False, wrapped byIQNSharedBackboneNetwork(trackmania_rl/nn_build/iqn_multimodal.py). Submodulefusionexposesforward_fusion_hidden; IQN addsiqn_fcand dueling heads. Shared quantile + dueling math:trackmania_rl/nn_build/iqn_quantile_forward.py. When the multimodal vision branch is CNN (defaultnn.vis.cnn), the conv stem usesnn.vis.cnnviatrackmania_rl/nn_build/vis_cnn_head.py— same flags as classicIQN_Network. BTR-style head options (LayerNorm, NoisyNet) oniqn_fc/ A / V use the flat config viatrackmania_rl/nn_build/iqn_btr_from_config.py.HF vision, fusion off —
nn.fusion_mode: noneandnn.vis.transformer.use_hf_backbone: true.HfActorCriticwithout policy heads +IQNSharedBackboneNetwork(same file as above).Classic CNN or float-only —
nn.fusion_mode: none,nn.vis.cnnornn.vis.no_image: true. ClassIQN_Network— CNN image head (or no image), float MLP, concat, then τ-embedding and dueling heads.
Not supported: nn.vis.transformer with native ViT (use_hf_backbone: false) while fusion_mode: none — build_iqn_network_uncompiled raises; use fusion_mode multimodal or switch to HF ViT.
Hub warm-start: for multimodal fusion, nn.init_from_pretrained is applied automatically on the PPO path; IQN does not run the same hook yet (see Neural network YAML (nn) — full reference).
Vision and tensors by topology
Classic ``IQN_Network``
Inputs: image
(B, 1, H, W), floats(B, float_input_dim).Outputs:
Qwith IQN quantiles — single-action(B*K, n_actions)or multi-action(B*K, N, n_actions)forn_actions_per_block.
![digraph iqn_overview {
rankdir=LR;
node [shape=box, fontname="Helvetica", fontsize=10];
img [label="img\n(B,1,H,W)", style="filled", fillcolor=lightblue];
flt [label="float_inputs\n(B,F)", style="filled", fillcolor=lightblue];
cnn [label="Image head\nCNN or IMPALA"];
mlp [label="Float head\nMLP"];
cat [label="Concat\n(B,D)"];
iqn [label="IQN block\nτ-embedding × state"];
duel [label="Dueling heads\nA + V"];
out [label="Q-values\n(B*K, A) or (B*K,N,A)", style="filled", fillcolor=lightgreen];
img -> cnn -> cat;
flt -> mlp -> cat;
cat -> iqn -> duel -> out;
}](../_images/graphviz-a50a935ea41a3f55950ee0bbcc121410ef32257f.png)
Shared backbone IQN (multimodal or HF vision)
Inputs are the same;
fusion.forward_fusion_hidden(img, float)produces a state vector(B, D)(widthnn.decoder.dense_hidden_dimensionafter bridge, orHfActorCritic.pre_trunk_feature_dimon HF path).Then the same τ cosine embedding,
iqn_fc, Hadamard product, and dueling readout as classic IQN (shared implementation).
Core blocks (classic path)
Image branch
By default IQN uses a 4-layer CNN image head. The BTR option can replace this head with IMPALA-CNN, but the interface is unchanged: image branch outputs a flat embedding per sample.
Float branch
A two-layer MLP transforms normalized scalar features to float_hidden_dim.
Fusion
Image and float embeddings are concatenated into dense_input_dimension.
IQN quantile module
For each sample, IQN draws/supplies K quantiles τ and computes:
cosine embedding of
τ(dimensioniqn_embedding_dimension),projection to state feature width,
element-wise multiplication with repeated fused state embedding.
This yields a quantile-conditioned latent representation (B*K, D).
![digraph iqn_tau {
rankdir=TB;
node [shape=box, fontname="Helvetica", fontsize=10];
tau [label="τ\n(B*K,1)", style="filled", fillcolor=lightblue];
cos [label="cos(pi*i*τ)"];
fc [label="Linear + activation\n-> (B*K,D)"];
st [label="state embed\n(B,D)", style="filled", fillcolor=lightblue];
rep [label="repeat K\n(B*K,D)"];
mul [label="Hadamard product"];
out [label="quantile latent\n(B*K,D)", style="filled", fillcolor=lightgreen];
tau -> cos -> fc -> mul;
st -> rep -> mul -> out;
}](../_images/graphviz-3fb6406723db13ac5cd1ed0b29c754df3746aeee.png)
Decoder (nn.decoder): MLP vs transformer slots
After fusing image + float into a flat vector (or after fusion hidden state), advantage and value each use a slot that is either mlp or transformer (mutually exclusive per slot):
MLP: hidden width defaults to
decoder.dense_hidden_dimension // 2ifmlp.hidden_dimis omitted;n_hidden_layersstacks of Linear (+ optional LayerNorm / NoisyLinear frombtr).Transformer:
IQNTransformerTrunkreshapes the flat vector into tokens(B, D/d_model, d_model)(so the input width must be divisible byd_model), runstorch.nn.TransformerEncoder, mean-pools tod_model, then a small Linear stack to actions or value.
``decoder.shared_input``: if either slot uses transformer, validation requires shared_input: post_tau. Classic IQN_Network.forward fuses image+float first, then applies quantile mixing (post-τ). pre_tau is schema-only for transformer slots today.
Dueling heads
Q(s,a,τ) = V(s,τ) + A(s,a,τ) - mean_a A(s,a,τ).
Multi-action mode factorizes by offset; output (B*K, N, n_actions).
Training flow (high level)
Collectors run an inference copy of the network.
Learner samples replay; target branch computes quantile targets.
Quantile Huber loss; target network soft/hard updates.
Key design notes
Distributional: return quantiles, not only mean Q.
Dueling: state value + action advantage.
DDQN, NoisyNet, multi-action: see config and BTR options (IQN + paper extras).
Implementation references
trackmania_rl/agents/iqn.py—IQN_Network,build_iqn_network_uncompiled, trainer, inferer.trackmania_rl/nn_build/iqn_multimodal.py—IQNSharedBackboneNetwork, fusion/HF factories.trackmania_rl/nn_build/iqn_quantile_forward.py— shared τ + dueling forward.trackmania_rl/nn_build/vis_cnn_head.py/iqn_btr_from_config.py— config → kwargs for shared CNN / BTR head wiring (see PPO actor-critic architecture for PPO side).trackmania_rl/multiprocess/collector_process.py,learner_process.py.
See also
NN topology catalog (supported stacks) — full matrix of supported
nntopologies.PPO actor-critic architecture — same multimodal / HF bodies with policy heads.
GRPO: network and training — same bodies under
ppo_wiring; GRPO-specific training.BTR options (IQN + paper extras) — IQN extras under
btr:.IQN model experiments — experiments.
Configuration Guide — full
nnreference.