.. _nn-topology-catalog: NN topology catalog (supported stacks) ======================================= This page lists **every routing path** the training code can build from YAML ``nn`` + ``training.algorithm`` + (IQN only) ``btr:``. It complements the narrative pages :doc:`iqn_architecture`, :doc:`ppo_architecture`, :doc:`grpo_architecture`, :doc:`btr_architecture` and the field-by-field :ref:`nn-yaml-reference` in :doc:`../configuration_guide`. **DPO** and **GRPO** reuse the **same** ``nn`` routing and built modules as **PPO** (``get_wiring("dpo" | "grpo")`` → ``ppo_wiring``). For training semantics, see :doc:`../configuration_guide` (:ref:`dpo-config`, :ref:`grpo-config`) and :doc:`grpo_architecture`. **Authoritative schema:** ``config_files/nn_schema.py`` (``NnConfig``). **Factory:** ``trackmania_rl/agents/policy_models/multimodal_torch_fusion.py`` (``TorchMultimodalActorCritic``, ``build_multimodal_fusion_uncompiled``), ``ppo_wiring.py``, ``iqn.py`` (``build_iqn_network_uncompiled``), ``hf_actor_critic.py``. **Vision branch name** in code is from ``infer_vis_branch(nn.vis)`` in ``nn_schema``: ``none`` (``no_image``), ``cnn``, ``native_transformer`` (``transformer`` with ``use_hf_backbone: false``), ``hf_transformer`` (``transformer`` with ``use_hf_backbone: true``). .. warning:: Most nested ``nn`` models use Pydantic ``extra="ignore"``. Unknown or misspelled keys under ``nn.*`` are **silently dropped** at load — they do **not** error. Prefer this catalog + :ref:`nn-yaml-reference` over guesswork. 1. ``fusion_mode: none`` (no multimodal stack) ---------------------------------------------- .. list-table:: :header-rows: 1 :widths: 10 22 28 40 * - Algorithm - Vision (effective ``infer_vis_branch``) - Built module(s) - Notes * - **IQN** - ``cnn`` (default if you omit ``no_image`` and do not set ``transformer`` — schema adds empty ``cnn``) - ``IQN_Network`` (``trackmania_rl/agents/iqn.py``) - Optional ``btr:`` (Munchausen, IMPALA CNN knobs merged into ``nn.vis.cnn`` when omitted, LayerNorm / NoisyNet on heads). See :doc:`btr_architecture`. * - **IQN** - ``none`` (``vis.no_image: true``) - ``IQN_Network`` - Float-only; image tensor can be zeros at runtime. * - **IQN** - ``hf_transformer`` - ``IQNSharedBackboneNetwork`` + headless ``HfActorCritic`` (``nn_build/iqn_multimodal.py``, ``hf_actor_critic.py``) - Requires ``pip install -e ".[policy]"`` (Hugging Face stack). * - **IQN** - ``native_transformer`` (``vis.transformer`` without HF) **with** ``fusion_mode: none`` - — - **Not wired:** ``build_iqn_network_uncompiled`` raises. Use ``fusion_mode`` in ``vision_transformer`` / ``post_concat`` / ``unified``, or HF vision with ``use_hf_backbone: true``. * - **PPO** - ``cnn`` / ``none`` - ``PpoActorCritic`` (``ppo_actor_critic.py``) - CNN kwargs **only** from ``nn.vis.cnn`` (no ``btr:`` merge on this path). ``no_image`` → float-only trunk. * - **PPO** - ``hf_transformer`` - ``HfActorCritic`` (``hf_actor_critic.py``) - HF CLS + float MLP + shared trunk + policy/value heads. * - **PPO** - ``native_transformer`` only (``transformer`` present, ``use_hf_backbone: false``, no ``cnn``) - ``PpoActorCritic`` (degenerate) - **Pitfall:** no conv stem is built → **float-only** behavior (image side zeros). For native patch vision use ``fusion_mode: vision_transformer`` (or another multimodal mode), not ``none``. 2. Multimodal fusion modes -------------------------- Here ``nn.fusion_mode`` is one of ``vision_transformer``, ``post_concat``, or ``unified``. **Shared body:** ``TorchMultimodalActorCritic`` (``multimodal_torch_fusion.py``). * **PPO** — ``include_policy_heads=True`` (trunk + ``policy_head`` / ``value_head``). * **IQN** — ``include_policy_heads=False``; wrapped by ``IQNSharedBackboneNetwork`` + ``iqn_fc`` + dueling heads (same quantile path as classic IQN after fusion hidden). **Float MLP width** for fusion builds: ``nn.encoder.mlp.hidden_dim`` if set, else ``nn.float.mlp.hidden_dim`` (``float_hidden_dim_effective()``). **Fusion trunk kind** (after early tokens / concat): ``nn.encoder.fusion_encoder`` if set, else inferred by ``infer_fusion_encoder`` in ``nn_schema``: 1. If ``fusion_encoder`` is set → use it (must agree with ``encoder.transformer.use_hf_backbone``; schema forbids ``native_transformer`` + HF backbone on the same encoder slot). 2. Else if ``encoder.transformer.use_hf_backbone: true`` → ``hf_embedding`` (HF model with ``inputs_embeds``, e.g. BERT-class; path from ``encoder.transformer.model_name_or_path`` or ``encoder.hf_embedding``). 3. Else if ``fusion_mode == vision_transformer`` → ``linear`` (concat embeddings → ``bridge`` Linear to ``decoder.dense_hidden_dimension``). 4. Else → ``native_transformer`` (``torch.nn.TransformerEncoder`` on the fusion sequence; ``n_layers: 0`` means **no** encoder layer — optional blocks skipped via ``_make_encoder_optional``). Explicit kinds **``mlp``** / **``cnn``** / **``hf_embedding``** use ``nn.encoder.fusion_mlp``, ``fusion_cnn``, ``hf_embedding`` respectively (see :ref:`nn-yaml-reference`). ``vision_transformer`` mode ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Image → float MLP → fuse (default trunk ``linear`` unless overridden). .. list-table:: :header-rows: 1 :widths: 20 35 45 * - ``infer_vis_branch`` - Image path (if ``use_image_head``) - Fusion path * - ``cnn`` - ``_build_img_head`` from ``nn.vis.cnn`` → Linear to ``vis.d_model`` - Default ``linear``: concat(image emb, float MLP) → ``bridge``. If ``fusion_encoder`` is non-linear, vision+float concat is projected to a short **sequence** (length ``encoder.transformer.post_concat_seq_len``) then fusion trunk. * - ``native_transformer`` - ``PatchEmbed2d`` + optional ``vis`` ``TransformerEncoder`` (``patch_size`` must divide ``H_downsized``, ``W_downsized``) - Same as above after pooling / embedding. * - ``hf_transformer`` - HF vision backbone + optional ``vis_refine`` (native encoder on tokens) - Same default ``linear`` / optional non-linear fusion trunk. * - ``none`` - No image tokens - Float-only side still participates in concat / sequence as implemented. ``post_concat`` mode ~~~~~~~~~~~~~~~~~~~~ Tokenize vision + float, then fusion trunk. .. list-table:: :header-rows: 1 :widths: 18 42 40 * - ``encoder.post_concat_layout`` - Behavior (simplified) - Typical vision * - ``fused_vector`` - Image branch (CNN / native / HF) and float MLP produce a **single fused vector** → projected to ``post_concat_seq_len`` tokens at ``fuse_d_model`` → fusion trunk (default ``native_transformer`` unless overridden). - CNN, native patch stack, or HF with ``fusion_tokens: summary`` (single vector per image). * - ``token_sequence`` - Vision contributes **one or many** tokens at ``fuse_d_model``; float side is **raw** or **MLP-hidden** tokens (``float_token_input``) in **dense** or **per_feature** layout (``float_token_layout``). ``per_feature`` forces ``float_token_input: raw`` and ``token_sequence`` (schema). - CNN → one vision token; native patches → many; HF with ``fusion_tokens: patch_tokens`` → many (requires ``token_sequence`` — schema). ``unified`` mode ~~~~~~~~~~~~~~~~ Joint sequence over image token(s) and learned float token(s). .. list-table:: :header-rows: 1 :widths: 22 38 40 * - ``infer_vis_branch`` - Image tokens - Constraints * - ``cnn`` - **One** image token (conv → Linear to ``fuse_d_model``) - Floats → ``unified_float_tokens`` via ``Linear(float_dim -> K*d)``; joint ``pos_uni``; fusion trunk per ``fusion_encoder``. * - ``native_transformer`` - Patch grid tokens at ``vis.d_model``; must equal ``encoder.transformer.d_model`` (``fuse_d_model``) - Schema enforces ``vis.transformer.d_model == encoder.transformer.d_model``. * - ``hf_transformer`` - **N** tokens from HF backbone (count derived from processor / backbone); projected to ``fuse_d_model`` - Optional native ``vis`` ``TransformerEncoder`` refine; ``n_layers: 0`` skips it. Same joint fusion trunk options as other multimodal modes. ``float_feature_extractor`` (2× MLP on floats) is **omitted** for ``unified`` and for ``post_concat`` + ``token_sequence`` + ``float_token_input: raw`` — floats enter tokenization directly where that path applies. 3. IQN decoder and BTR on heads ------------------------------- Applies to **classic** ``IQN_Network`` and **shared-backbone** IQN (multimodal / HF vision). * **Slots** ``decoder.advantage`` and ``decoder.value``: **either** ``mlp`` **or** ``transformer`` (not both per slot). Aliases: ``mlp.layers`` ↔ ``n_hidden_layers``; ``hidden`` ↔ ``hidden_dim``. * **Transformer slot:** native ``torch.nn.TransformerEncoder`` on chunked state; schema requires ``decoder.shared_input: post_tau`` if any slot uses ``transformer``. * **BTR** dense-head flags (LayerNorm, NoisyNet, ``noisy_sigma0``) apply via ``iqn_btr_mlp_head_kw_from_config`` (see :doc:`btr_architecture`). 4. Warm start and checkpoints ----------------------------- * **Multimodal PPO:** ``nn.init_from_pretrained`` — Rulka fusion ``save_pretrained`` dir; loaded after build in ``make_multimodal_fusion_network_pair`` (unless skipped via utility flag; see :ref:`nn-yaml-reference`). * **Multimodal IQN:** same directory format may exist, but **automatic** hub load is **not** guaranteed to mirror PPO — prefer continuing from ``weights1.torch`` / explicit load in your workflow. * **Hub JSON** may carry ``rulka_transformers.vis_cnn`` for CNN stems; older bundles without it fall back to default conv kwargs (see :doc:`ppo_architecture`). 5. Reference YAML files ----------------------- .. list-table:: :header-rows: 1 :widths: 35 65 * - File - Role * - ``config_default.yaml`` / ``config_btr.yaml`` - IQN ``fusion_mode: none`` + CNN; ``config_btr.yaml`` enables full ``btr:`` recipe. * - ``config_btr_post_concat_cnn_transformer.yaml`` - IQN + ``post_concat`` + CNN + native fusion ``TransformerEncoder`` + ``btr:``. * - ``config_ppo.yaml`` - PPO baseline (``fusion_mode: none``); starting point for native ``vision_transformer`` (change ``fusion_mode`` + ``vis.transformer`` / remove ``cnn`` as needed). * - ``config_ppo_cnn_mlp.yaml`` - Minimal PPO CNN + float MLP. * - ``config_ppo_post_concat_cnn_tf.yaml`` - PPO ``post_concat`` + CNN + native fusion transformer. * - ``config_ppo_transformer.yaml`` - PPO ``post_concat`` + **HF** timm vision + **HF** fusion encoder (historical filename). There is **no** single YAML file covering every cell of the tables above; combine :ref:`nn-yaml-reference` with the closest example and edit ``nn`` fields. See also -------- - :doc:`iqn_architecture` — IQN routing and tensors - :doc:`ppo_architecture` — PPO variants A/B/C - :doc:`grpo_architecture` — GRPO training (same stacks as PPO) - :doc:`btr_architecture` — BTR flags on IQN - :ref:`nn-yaml-reference` in :doc:`../configuration_guide` - ``config_files/nn_schema.py`` — validators for mutually exclusive options and geometry