.. _iqn_architecture:

IQN architecture
================

IQN (implicit quantile networks) is the **distributional off-policy** baseline: quantile Q-values, replay, target network. Implementation centers on ``trackmania_rl/agents/iqn.py``; **which module graph is built** depends on ``nn.fusion_mode`` and ``nn.vis`` (see below).

For optional BTR paper features, see :doc:`btr_architecture`. BTR is **not** a separate ``training.algorithm``; it toggles behavior on top of IQN.

Configuration lives under YAML ``nn`` (:ref:`nn-yaml-reference` in :doc:`../configuration_guide`). RL freeze flags: :ref:`nn-rl-parameter-freeze`.

Which network is built? (routing)
-----------------------------------

``build_iqn_network_uncompiled()`` (used by training and BC when ``algorithm: iqn``) picks **one** topology:

1. **Multimodal fusion** — ``nn.fusion_mode`` is ``vision_transformer``, ``post_concat``, or ``unified``. Builds the same ``TorchMultimodalActorCritic`` body as PPO with ``include_policy_heads=False``, wrapped by ``IQNSharedBackboneNetwork`` (``trackmania_rl/nn_build/iqn_multimodal.py``). Submodule ``fusion`` exposes ``forward_fusion_hidden``; IQN adds ``iqn_fc`` and dueling heads. Shared quantile + dueling math: ``trackmania_rl/nn_build/iqn_quantile_forward.py``. When the multimodal **vision branch** is CNN (default ``nn.vis.cnn``), the conv stem uses ``nn.vis.cnn`` via ``trackmania_rl/nn_build/vis_cnn_head.py`` — same flags as classic ``IQN_Network``. BTR-style **head** options (LayerNorm, NoisyNet) on ``iqn_fc`` / A / V use the flat config via ``trackmania_rl/nn_build/iqn_btr_from_config.py``.

2. **HF vision, fusion off** — ``nn.fusion_mode: none`` and ``nn.vis.transformer.use_hf_backbone: true``. ``HfActorCritic`` without policy heads + ``IQNSharedBackboneNetwork`` (same file as above).

3. **Classic CNN or float-only** — ``nn.fusion_mode: none``, ``nn.vis.cnn`` or ``nn.vis.no_image: true``. Class ``IQN_Network`` — CNN image head (or no image), float MLP, concat, then τ-embedding and dueling heads.

**Not supported:** ``nn.vis.transformer`` with **native** ViT (``use_hf_backbone: false``) while ``fusion_mode: none`` — ``build_iqn_network_uncompiled`` raises; use ``fusion_mode`` multimodal or switch to HF ViT.

**Hub warm-start:** for multimodal fusion, ``nn.init_from_pretrained`` is applied automatically on the **PPO** path; IQN does not run the same hook yet (see :ref:`nn-yaml-reference`).

Vision and tensors by topology
--------------------------------

**Classic ``IQN_Network``**

- Inputs: image ``(B, 1, H, W)``, floats ``(B, float_input_dim)``.
- Outputs: ``Q`` with IQN quantiles — single-action ``(B*K, n_actions)`` or multi-action ``(B*K, N, n_actions)`` for ``n_actions_per_block``.

.. graphviz::

   digraph iqn_overview {
      rankdir=LR;
      node [shape=box, fontname="Helvetica", fontsize=10];
      img [label="img\n(B,1,H,W)", style="filled", fillcolor=lightblue];
      flt [label="float_inputs\n(B,F)", style="filled", fillcolor=lightblue];
      cnn [label="Image head\nCNN or IMPALA"];
      mlp [label="Float head\nMLP"];
      cat [label="Concat\n(B,D)"];
      iqn [label="IQN block\nτ-embedding × state"];
      duel [label="Dueling heads\nA + V"];
      out [label="Q-values\n(B*K, A) or (B*K,N,A)", style="filled", fillcolor=lightgreen];

      img -> cnn -> cat;
      flt -> mlp -> cat;
      cat -> iqn -> duel -> out;
   }

**Shared backbone IQN** (multimodal or HF vision)

- Inputs are the same; ``fusion.forward_fusion_hidden(img, float)`` produces a state vector ``(B, D)`` (width ``nn.decoder.dense_hidden_dimension`` after bridge, or ``HfActorCritic.pre_trunk_feature_dim`` on HF path).
- Then the **same** τ cosine embedding, ``iqn_fc``, Hadamard product, and dueling readout as classic IQN (shared implementation).

Core blocks (classic path)
--------------------------

Image branch
~~~~~~~~~~~~

By default IQN uses a 4-layer CNN image head. The BTR option can replace this
head with IMPALA-CNN, but the interface is unchanged: image branch outputs a
flat embedding per sample.

Float branch
~~~~~~~~~~~~

A two-layer MLP transforms normalized scalar features to ``float_hidden_dim``.

Fusion
~~~~~~

Image and float embeddings are concatenated into ``dense_input_dimension``.

IQN quantile module
~~~~~~~~~~~~~~~~~~~

For each sample, IQN draws/supplies ``K`` quantiles ``τ`` and computes:

1. cosine embedding of ``τ`` (dimension ``iqn_embedding_dimension``),
2. projection to state feature width,
3. element-wise multiplication with repeated fused state embedding.

This yields a quantile-conditioned latent representation ``(B*K, D)``.

.. graphviz::

   digraph iqn_tau {
      rankdir=TB;
      node [shape=box, fontname="Helvetica", fontsize=10];
      tau [label="τ\n(B*K,1)", style="filled", fillcolor=lightblue];
      cos [label="cos(pi*i*τ)"];
      fc [label="Linear + activation\n-> (B*K,D)"];
      st [label="state embed\n(B,D)", style="filled", fillcolor=lightblue];
      rep [label="repeat K\n(B*K,D)"];
      mul [label="Hadamard product"];
      out [label="quantile latent\n(B*K,D)", style="filled", fillcolor=lightgreen];
      tau -> cos -> fc -> mul;
      st -> rep -> mul -> out;
   }

Decoder (``nn.decoder``): MLP vs transformer slots
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

After fusing image + float into a flat vector (or after fusion hidden state), **advantage** and **value** each use a slot that is **either** ``mlp`` **or** ``transformer`` (mutually exclusive per slot):

- **MLP:** hidden width defaults to ``decoder.dense_hidden_dimension // 2`` if ``mlp.hidden_dim`` is omitted; ``n_hidden_layers`` stacks of Linear (+ optional LayerNorm / NoisyLinear from ``btr``).
- **Transformer:** ``IQNTransformerTrunk`` reshapes the flat vector into tokens ``(B, D/d_model, d_model)`` (so the input width must be divisible by ``d_model``), runs ``torch.nn.TransformerEncoder``, mean-pools to ``d_model``, then a small Linear stack to actions or value.

**``decoder.shared_input``:** if either slot uses ``transformer``, validation requires ``shared_input: post_tau``. Classic ``IQN_Network.forward`` fuses image+float first, then applies quantile mixing (**post-τ**). ``pre_tau`` is schema-only for transformer slots today.

Dueling heads
~~~~~~~~~~~~~

``Q(s,a,τ) = V(s,τ) + A(s,a,τ) - mean_a A(s,a,τ)``.

Multi-action mode factorizes by offset; output ``(B*K, N, n_actions)``.

Training flow (high level)
--------------------------

1. Collectors run an inference copy of the network.
2. Learner samples replay; target branch computes quantile targets.
3. Quantile Huber loss; target network soft/hard updates.

Key design notes
----------------

- **Distributional:** return quantiles, not only mean Q.
- **Dueling:** state value + action advantage.
- **DDQN, NoisyNet, multi-action:** see config and :doc:`btr_architecture`.

Implementation references
-------------------------

- ``trackmania_rl/agents/iqn.py`` — ``IQN_Network``, ``build_iqn_network_uncompiled``, trainer, inferer.
- ``trackmania_rl/nn_build/iqn_multimodal.py`` — ``IQNSharedBackboneNetwork``, fusion/HF factories.
- ``trackmania_rl/nn_build/iqn_quantile_forward.py`` — shared τ + dueling forward.
- ``trackmania_rl/nn_build/vis_cnn_head.py`` / ``iqn_btr_from_config.py`` — config → kwargs for shared CNN / BTR head wiring (see :doc:`ppo_architecture` for PPO side).
- ``trackmania_rl/multiprocess/collector_process.py``, ``learner_process.py``.

See also
--------

- :doc:`nn_topology_catalog` — full matrix of supported ``nn`` topologies.
- :doc:`ppo_architecture` — same multimodal / HF bodies with policy heads.
- :doc:`grpo_architecture` — same bodies under ``ppo_wiring``; GRPO-specific training.
- :doc:`btr_architecture` — IQN extras under ``btr:``.
- :doc:`../experiments/models/iqn` — experiments.
- :doc:`../configuration_guide` — full ``nn`` reference.