IQN architecture

IQN (implicit quantile networks) is the distributional off-policy baseline: quantile Q-values, replay, target network. Implementation centers on trackmania_rl/agents/iqn.py; which module graph is built depends on nn.fusion_mode and nn.vis (see below).

For optional BTR paper features, see BTR options (IQN + paper extras). BTR is not a separate training.algorithm; it toggles behavior on top of IQN.

Configuration lives under YAML nn (Neural network YAML (nn) — full reference in Configuration Guide). RL freeze flags: RL parameter freeze.

Which network is built? (routing)

build_iqn_network_uncompiled() (used by training and BC when algorithm: iqn) picks one topology:

Multimodal fusion — nn.fusion_mode is vision_transformer, post_concat, or unified. Builds the same TorchMultimodalActorCritic body as PPO with include_policy_heads=False, wrapped by IQNSharedBackboneNetwork (trackmania_rl/nn_build/iqn_multimodal.py). Submodule fusion exposes forward_fusion_hidden; IQN adds iqn_fc and dueling heads. Shared quantile + dueling math: trackmania_rl/nn_build/iqn_quantile_forward.py. When the multimodal vision branch is CNN (default nn.vis.cnn), the conv stem uses nn.vis.cnn via trackmania_rl/nn_build/vis_cnn_head.py — same flags as classic IQN_Network. BTR-style head options (LayerNorm, NoisyNet) on iqn_fc / A / V use the flat config via trackmania_rl/nn_build/iqn_btr_from_config.py.
HF vision, fusion off — nn.fusion_mode: none and nn.vis.transformer.use_hf_backbone: true. HfActorCritic without policy heads + IQNSharedBackboneNetwork (same file as above).
Classic CNN or float-only — nn.fusion_mode: none, nn.vis.cnn or nn.vis.no_image: true. Class IQN_Network — CNN image head (or no image), float MLP, concat, then τ-embedding and dueling heads.

Not supported: nn.vis.transformer with native ViT (use_hf_backbone: false) while fusion_mode: none — build_iqn_network_uncompiled raises; use fusion_mode multimodal or switch to HF ViT.

Hub warm-start: for multimodal fusion, nn.init_from_pretrained is applied automatically on the PPO path; IQN does not run the same hook yet (see Neural network YAML (nn) — full reference).

Vision and tensors by topology

Classic ``IQN_Network``

Inputs: image (B, 1, H, W), floats (B, float_input_dim).
Outputs: Q with IQN quantiles — single-action (B*K, n_actions) or multi-action (B*K, N, n_actions) for n_actions_per_block.

$digraph iqn_overview { rankdir=LR; node [shape=box, fontname="Helvetica", fontsize=10]; img [label="img\n(B,1,H,W)", style="filled", fillcolor=lightblue]; flt [label="float_inputs\n(B,F)", style="filled", fillcolor=lightblue]; cnn [label="Image head\nCNN or IMPALA"]; mlp [label="Float head\nMLP"]; cat [label="Concat\n(B,D)"]; iqn [label="IQN block\nτ-embedding × state"]; duel [label="Dueling heads\nA + V"]; out [label="Q-values\n(B*K, A) or (B*K,N,A)", style="filled", fillcolor=lightgreen]; img -> cnn -> cat; flt -> mlp -> cat; cat -> iqn -> duel -> out; }$

Shared backbone IQN (multimodal or HF vision)

Inputs are the same; fusion.forward_fusion_hidden(img, float) produces a state vector (B, D) (width nn.decoder.dense_hidden_dimension after bridge, or HfActorCritic.pre_trunk_feature_dim on HF path).
Then the same τ cosine embedding, iqn_fc, Hadamard product, and dueling readout as classic IQN (shared implementation).

Core blocks (classic path)

Image branch

By default IQN uses a 4-layer CNN image head. The BTR option can replace this head with IMPALA-CNN, but the interface is unchanged: image branch outputs a flat embedding per sample.

Float branch

A two-layer MLP transforms normalized scalar features to float_hidden_dim.

Fusion

Image and float embeddings are concatenated into dense_input_dimension.

IQN quantile module

For each sample, IQN draws/supplies K quantiles τ and computes:

cosine embedding of τ (dimension iqn_embedding_dimension),
projection to state feature width,
element-wise multiplication with repeated fused state embedding.

This yields a quantile-conditioned latent representation (B*K, D).

$digraph iqn_tau { rankdir=TB; node [shape=box, fontname="Helvetica", fontsize=10]; tau [label="τ\n(B*K,1)", style="filled", fillcolor=lightblue]; cos [label="cos(pi*i*τ)"]; fc [label="Linear + activation\n-> (B*K,D)"]; st [label="state embed\n(B,D)", style="filled", fillcolor=lightblue]; rep [label="repeat K\n(B*K,D)"]; mul [label="Hadamard product"]; out [label="quantile latent\n(B*K,D)", style="filled", fillcolor=lightgreen]; tau -> cos -> fc -> mul; st -> rep -> mul -> out; }$

Decoder (`nn.decoder`): MLP vs transformer slots

After fusing image + float into a flat vector (or after fusion hidden state), advantage and value each use a slot that is either mlp or transformer (mutually exclusive per slot):

MLP: hidden width defaults to decoder.dense_hidden_dimension // 2 if mlp.hidden_dim is omitted; n_hidden_layers stacks of Linear (+ optional LayerNorm / NoisyLinear from btr).
Transformer: IQNTransformerTrunk reshapes the flat vector into tokens (B, D/d_model, d_model) (so the input width must be divisible by d_model), runs torch.nn.TransformerEncoder, mean-pools to d_model, then a small Linear stack to actions or value.

``decoder.shared_input``: if either slot uses transformer, validation requires shared_input: post_tau. Classic IQN_Network.forward fuses image+float first, then applies quantile mixing (post-τ). pre_tau is schema-only for transformer slots today.

Dueling heads

Q(s,a,τ) = V(s,τ) + A(s,a,τ) - mean_a A(s,a,τ).

Multi-action mode factorizes by offset; output (B*K, N, n_actions).

Training flow (high level)

Collectors run an inference copy of the network.
Learner samples replay; target branch computes quantile targets.
Quantile Huber loss; target network soft/hard updates.

Key design notes

Distributional: return quantiles, not only mean Q.
Dueling: state value + action advantage.
DDQN, NoisyNet, multi-action: see config and BTR options (IQN + paper extras).

Implementation references

trackmania_rl/agents/iqn.py — IQN_Network, build_iqn_network_uncompiled, trainer, inferer.
trackmania_rl/nn_build/iqn_multimodal.py — IQNSharedBackboneNetwork, fusion/HF factories.
trackmania_rl/nn_build/iqn_quantile_forward.py — shared τ + dueling forward.
trackmania_rl/nn_build/vis_cnn_head.py / iqn_btr_from_config.py — config → kwargs for shared CNN / BTR head wiring (see PPO actor-critic architecture for PPO side).
trackmania_rl/multiprocess/collector_process.py, learner_process.py.