User FAQ

General Questions

Q: What is distributional RL and why IQN?

A: Distributional RL models the full distribution of returns rather than just expected values. IQN (Implicit Quantile Networks) is particularly effective for TrackMania because:

  • Handles stochastic environments well

  • Better exploration through uncertainty estimation

  • More stable learning than standard DQN

Q: How long does training take?

A: Depends on the map complexity:

  • Simple maps (A01-A05): 2-5M frames (~2-4 hours with 4 collectors)

  • Medium maps (ESL-Hockolicious): 5-10M frames (~4-8 hours)

  • Complex maps (map5, Endurance): 15-30M frames (~12-24 hours)

Q: What hardware do I need?

A: Minimum recommended:

  • CPU: 4+ cores

  • GPU: NVIDIA with 6GB+ VRAM (GTX 1060 or better)

  • RAM: 20GB+ (16GB system + 4GB+ VRAM)

  • Storage: 10GB+ for saves and tensorboard logs

Q: Can I use AMD GPUs?

A: PyTorch supports AMD GPUs via ROCm, but we haven’t tested extensively. NVIDIA CUDA is recommended and well-tested.

Q: Why do I need multiple game instances?

A: Parallel game instances speed up data collection significantly. More instances = faster training, but requires more RAM/CPU.

Performance

Q: How many game instances should I use?

A: Start with 2-4 and monitor:

  • CPU usage: should be 70-90% (not maxed)

  • RAM usage: ~2GB per instance

  • GPU usage: 80-95% is optimal

Q: What about window focus with multiple instances? 🆕

A: Automatically handled! The code manages window focus intelligently:

  • Each game window gets focus once when loading a new map

  • No “focus war” between instances (they work in background)

  • Works correctly with 8+ parallel instances

  • Minimal performance impact (<0.01%)

Important: Don’t minimize game windows - the game pauses when minimized. You can place other windows on top instead.

Q: Why are my cars not moving during training?

A: Most likely causes:

  1. Windows minimized: Game pauses when minimized (unminimize them)

  2. First-time setup: Window focus is set automatically on first map load

  3. Map cycling: With multiple maps, focus resets on each map change (automatic)

If cars suddenly stop after working fine:

  • Check windows aren’t minimized

  • Look for timeout messages in logs

  • Verify all instances show similar frame generation rates

Q: How many game instances should I use (specific numbers)? - Training throughput: batches per minute

Increase instances until you hit a bottleneck (usually CPU or RAM).

Q: Linux vs Windows performance?

A: On a system with Ryzen 7 5700G, 64GB RAM, RTX 4070 Ti:

  • Linux: ~58% faster training

  • Linux sweet spot: 4 collectors

  • Windows sweet spot: 2 collectors

Linux is recommended for serious training due to better wine performance with DXVK.

Q: My training is slow, what can I do?

A: Optimization checklist:

  1. Increase gpu_collectors_count (if RAM/CPU allows)

  2. Increase running_speed to 160-200x

  3. Lower game resolution and graphics quality

  4. Reduce batch_size for more frequent updates

  5. Disable visualization options in the performance section of the config YAML

  6. Close unnecessary background applications

Training

Q: My agent isn’t learning / getting stuck

A: Common causes and fixes:

  • Bad reference line: Verify VCP file covers entire track

  • Too low exploration: Increase epsilon in early training

  • Wrong map path: Check map loads correctly

  • Timeout too short: Increase cutoff_rollout_if_no_vcp_passed_within_duration_ms

  • Reward imbalance: Check time penalty vs progress reward ratio

Q: Can I train on multiple maps simultaneously?

A: Yes! Edit the map_cycle.entries section in your config YAML to alternate between maps:

map_cycle:
  entries:
    - {short_name: map1, map_path: "Map1.Gbx", reference_line_path: "map1_0.5m_cl.npy", is_exploration: true, fill_buffer: true, repeat: 4}
    - {short_name: map2, map_path: "Map2.Gbx", reference_line_path: "map2_0.5m_cl.npy", is_exploration: true, fill_buffer: true, repeat: 4}

Q: How do I resume training from a checkpoint?

A: Checkpoints are saved automatically in save/{run_name}/. To resume:

  1. Ensure .torch files exist in save/{run_name}/

  2. Keep the same run_name in the training section of your config

  3. Run python scripts/train.py --config config_files/rl/config_default.yaml

Training will load the checkpoint automatically.

Q: Can I change hyperparameters during training?

A: Config is loaded once at startup. To change parameters you must edit the YAML file and restart training. A snapshot of the config used for each run is saved in save/{run_name}/config_snapshot.yaml.

⚠️ Don’t change: Network architecture, input dimensions, action space - these require restart.

Q: What’s the difference between exploration and evaluation runs?

A:

  • Exploration (is_explo=True): Agent uses epsilon-greedy + Boltzmann exploration to discover new strategies

  • Evaluation (is_explo=False): Agent plays greedily to measure current skill level

Typical ratio: 4 exploration runs per 1 evaluation run.

Maps & Replays

Q: How do I create virtual checkpoints for a new map?

A:

  1. Drive the track manually and save a replay

  2. Place replay in Documents/TrackMania/Tracks/Replays/

  3. Run: python scripts/tools/gbx_to_vcp.py "path/to/replay.Replay.Gbx"

  4. VCP file is saved to maps/ folder (e.g., MapName_0.5m_cl.npy)

  5. Update the map_cycle.entries in your config YAML to reference the new VCP file

Q: Does the reference line need to be fast?

A: No! A slow centerline drive is perfectly fine. The reference line is only used to:

  • Track progress along the track

  • Define forward direction

  • Provide waypoint lookahead to the agent

The agent will learn to drive faster than the reference line.

Q: Can I use someone else’s world record replay?

A: Yes, but centerline is often better because:

  • WR lines may use advanced techniques (wallbangs, cuts)

  • Agent might struggle to discover these early in training

  • Centerline provides more uniform progress tracking

Q: My map won’t load / stuck on loading screen

A: Check:

  • Map file is NOT in OneDrive/cloud storage

  • Map path in config matches actual file location

  • Map file is valid (.Challenge.Gbx format)

  • Game is in windowed mode (not minimized)

Q: How do I replay agent runs?

A: Best runs are saved in save/{run_name}/best_runs/{map}_{time}/:

  1. Copy .inputs file to Documents/TMInterface/Scripts/

  2. Open game and load the map

  3. Open TMInterface console (F12)

  4. Type: load filename.inputs

  5. Press Enter to play

  6. Save replay if desired

Configuration

Q: Where do I find all configuration options?

A: Configuration is split across modules in config_files/:

  • Quick reference: Inline comments in each module

  • Full docs: Configuration Guide

  • Overview: config_files/README.md

Q: What’s the recommended configuration for my first training?

A: Default configuration is pre-tuned for ESL-Hockolicious. For other maps:

  • Easier than Hocko: Set global_schedule_speed = 0.8

  • Harder than Hocko: Set global_schedule_speed = 1.5

  • Very technical maps: Reduce tm_engine_step_per_action to 3-4

Q: What does global_schedule_speed do?

A: Multiplier for all frame-based schedules:

  • 1.0: Normal speed (default)

  • 0.8: 20% faster schedule (for easier maps)

  • 1.5: 50% slower schedule (for harder maps)

This uniformly speeds up/slows down learning rate decay, epsilon decay, buffer growth, etc.

Monitoring

Q: What metrics should I watch in TensorBoard?

A: Metrics are organized into groups for easier navigation. Key metrics to monitor:

Training/ group: - Training/loss: Training loss (can increase early - normal for RL) - Training/loss_test: Test loss on held-out buffer - Training/learning_rate: Current learning rate (decays over time)

RL/ group: - RL/avg_Q: Expected reward (should increase after initial drop) - RL/single_zone_reached: How far agent drives (% of track completed) - RL/gamma: Discount factor (typically 0.999 → 1.0) - RL/epsilon: Exploration rate (decays from 1.0 to ~0.03)

Race/ group: - Race/eval_race_time_robust: Best evaluation times (most important performance metric) - Race/explo_race_time_finished: Exploration run times (more variable, includes exploration)

Gradients/ group: - Gradients/norm_median: Median gradient norm after clipping (should be stable) - Gradients/norm_before_clip_max: Maximum gradient norm BEFORE clipping (watch for explosions >100) - Gradients/by_layer/: Per-layer gradient norms (useful for debugging)

Performance/ group: - Performance/transitions_learned_per_second: Training throughput - Performance/learner_percentage_training: % time spent training (should be high)

Buffer/ group: - Buffer/size: Current replay buffer size - Buffer/priorities_median: Median priority (if using prioritized replay)

IQN/ group (IQN-specific): - IQN/quantile_std_action_X: Standard deviation of quantile predictions per action (measures uncertainty)

Q: Why is my loss increasing?

A: In RL, loss increasing early in training is normal and expected! It means:

  • Agent is discovering the environment

  • Identifying inconsistencies in its value estimates

  • Learning is progressing correctly

Loss should stabilize or decrease after ~1-2M frames.

Q: My agent finishes the track but times aren’t improving

A: This is the “optimization phase” (after ~3-5M frames):

  • Progress is slower now

  • Agent is refining strategy details

  • Continue training for 10-20M more frames

  • Consider enabling shaped rewards for faster progress

Q: How are TensorBoard metrics organized?

A: All metrics are grouped into categories using prefixes. This makes navigation easier:

Training/ - Training process metrics: - Training/loss - Training loss (can increase early - normal in RL) - Training/loss_test - Test loss on held-out buffer - Training/learning_rate - Current learning rate (decays according to schedule) - Training/weight_decay - L2 regularization strength - Training/batch_size - Batch size used for training - Training/n_steps - N-step return horizon - Training/train_on_batch_duration - Time per training batch

Gradients/ - Gradient monitoring (critical for stability): - Gradients/norm_median, Gradients/norm_d9, Gradients/norm_max - Gradient norms AFTER clipping (should be stable, typically <30) - Gradients/norm_before_clip_max - Watch this! Maximum gradient norm BEFORE clipping. Values >100 indicate gradient explosions. Should typically be <50. - Gradients/by_layer/{layer_name}/L2_* - Per-layer gradient L2 norms (useful for debugging which layers have issues) - Gradients/by_layer/{layer_name}/Linf_* - Per-layer gradient max norms

RL/ - Reinforcement learning hyperparameters and metrics: - RL/avg_Q - Expected future reward (key learning indicator, should increase after initial drop) - RL/single_zone_reached - How far agent drives (% of track completed) - RL/gamma - Discount factor (typically 0.999 → 1.0) - RL/epsilon - Epsilon-greedy exploration rate (decays from 1.0 to ~0.03) - RL/epsilon_boltzmann - Boltzmann exploration temperature - RL/tau_epsilon_boltzmann - Boltzmann tau parameter

Race/ - Race performance metrics: - Race/eval_race_time_robust - Most important! Best evaluation times (greedy policy, no exploration) - Race/explo_race_time_finished - Exploration run times (includes exploration, more variable) - Race/race_time_ratio_* - Race time relative to rollout duration - Race/split_* - Split times between checkpoints

Performance/ - System performance metrics: - Performance/transitions_learned_per_second - Training throughput - Performance/learner_percentage_training - % time spent training (should be high, >70%) - Performance/learner_percentage_waiting_for_workers - % time waiting for data (should be low, <20%) - Performance/learner_percentage_testing - % time spent on test batches

Buffer/ - Replay buffer statistics: - Buffer/size - Current buffer size - Buffer/max_size - Maximum buffer capacity - Buffer/priorities_* - Priority statistics (if using prioritized replay)

Network/ - Neural network weights and optimizer state: - Network/weights/{layer_name}/L2 - L2 norm of layer weights - Network/optimizer/{layer_name}/adaptive_lr_L2 - Per-parameter adaptive learning rates (Adam/RAdam) - Network/optimizer/{layer_name}/exp_avg_L2 - First moment estimate (Adam/RAdam) - Network/optimizer/{layer_name}/exp_avg_sq_L2 - Second moment estimate (Adam/RAdam)

IQN/ - IQN-specific metrics (Implicit Quantile Network): - IQN/quantile_std_action_{i} - Standard deviation of quantile predictions per action. Higher values indicate more uncertainty in Q-value estimates. Useful for understanding model confidence.

Tips for using TensorBoard: - Use the “SCALARS” tab to filter by group prefix (e.g., type “Gradients/” to see all gradient metrics) - The “Custom Scalars” tab has pre-configured layouts for key metrics - Watch Gradients/norm_before_clip_max closely - sudden spikes indicate gradient explosions - RL/avg_Q should generally trend upward after initial exploration phase - Race/eval_race_time_robust is your primary performance metric - lower is better

Technical Issues

Q: FileNotFoundError: Python_Link.as

A: Copy the plugin:

# Windows
New-Item -ItemType Directory -Force -Path "$env:USERPROFILE\Documents\TMInterface\Plugins"
Copy-Item "trackmania_rl\tmi_interaction\Python_Link.as" "$env:USERPROFILE\Documents\TMInterface\Plugins\"

Q: Game stuck on login screen

A: TMNF account must be an online account. Create one through the game launcher without TMInterface.

Q: CUDA out of memory

A: Reduce memory usage:

  • Decrease batch_size in the training section

  • Decrease memory_size_schedule in the memory section

  • Reduce gpu_collectors_count in the performance section

  • Lower image resolution (w_downsized, h_downsized) in the neural_network section

Q: Game crashes / TMInterface connection lost

A: Common fixes:

  • Increase timeout_during_run_ms in the environment section

  • Reduce running_speed (< 200x)

  • Check TMLoader profile is correctly configured

  • Verify no firewall blocking TMInterface

  • Restart game instances (automatic every 12 hours)

Contributing

Q: How can I contribute to this project?

A: Contributions welcome:

  • Report issues and bugs

  • Share your training results

  • Improve documentation

  • Add new features or algorithms

  • Optimize performance

See DEVELOPMENT.md for development setup.

Q: Can I share my trained models?

A: Yes! Model weights are in save/{run_name}/weights*.torch. Share with the community to help others.

⚠️ Important: All AI runs are Tool Assisted and must NOT be submitted to official leaderboards.

Additional Resources