TrackMania RL - Documentation

Welcome to the TrackMania RL project documentation!

This is a fork and extension of the original Linesight project, adapted for reinforcement learning experiments in Trackmania Nations Forever.

The project trains an AI agent to drive in Trackmania Nations Forever using reinforcement learning. The default stack is IQN (Implicit Quantile Networks, distributional off-policy RL). Policy optimization alternatives are PPO (on-policy clipped actor-critic), DPO (preference-based, same network as PPO), and GRPO (group-relative returns; see GRPO: network and training). All can use a CNN image head, Hugging Face vision, or a shared native multimodal fusion graph (nn.fusion_mode); see Model architectures and Configuration Guide.

Key Features:

Distributional RL with IQN (Implicit Quantile Network), default
Optional on-policy PPO, DPO, and GRPO with the shared TM rollout pipeline and PPO-style actor-critic (see PPO configuration (ppo:), DPO configuration (dpo:), GRPO configuration (grpo:) in Configuration Guide; architecture diagrams under Model architectures)
Modular configuration system for easy experimentation
Support for multiple parallel game instances
Hot-reloadable training parameters
TensorBoard integration for monitoring
Virtual checkpoint system for dense progress tracking

All runs produced by this project are Tool Assisted. They must not be submitted to the Official Leaderboards.

User Documentation:

Model Architectures:

Model architectures
- Which stack when?
- Contents

Experiments:

Experiments
- Contents

Community tips & tricks

Empty page