User FAQ

General Questions

Q: What is distributional RL and why IQN?

A: Distributional RL models the full distribution of returns rather than just expected values. IQN (Implicit Quantile Networks) is particularly effective for TrackMania because:

Handles stochastic environments well
Better exploration through uncertainty estimation
More stable learning than standard DQN

Q: How long does training take?

A: Depends on the map complexity:

Simple maps (A01-A05): 2-5M frames (~2-4 hours with 4 collectors)
Medium maps (ESL-Hockolicious): 5-10M frames (~4-8 hours)
Complex maps (map5, Endurance): 15-30M frames (~12-24 hours)

Q: What hardware do I need?

A: Minimum recommended:

CPU: 4+ cores
GPU: NVIDIA with 6GB+ VRAM (GTX 1060 or better)
RAM: 20GB+ (16GB system + 4GB+ VRAM)
Storage: 10GB+ for saves and tensorboard logs

Q: Can I use AMD GPUs?

A: PyTorch supports AMD GPUs via ROCm, but we haven’t tested extensively. NVIDIA CUDA is recommended and well-tested.

Q: Why do I need multiple game instances?

A: Parallel game instances speed up data collection significantly. More instances = faster training, but requires more RAM/CPU.

Performance

Q: How many game instances should I use?

A: Start with 2-4 and monitor:

CPU usage: should be 70-90% (not maxed)
RAM usage: ~2GB per instance
GPU usage: 80-95% is optimal

Q: What about window focus with multiple instances? 🆕

A: Automatically handled! The code manages window focus intelligently:

Each game window gets focus once when loading a new map
No “focus war” between instances (they work in background)
Works correctly with 8+ parallel instances
Minimal performance impact (<0.01%)

Important: Don’t minimize game windows - the game pauses when minimized. You can place other windows on top instead.

Q: Why are my cars not moving during training?

A: Most likely causes:

Windows minimized: Game pauses when minimized (unminimize them)
First-time setup: Window focus is set automatically on first map load
Map cycling: With multiple maps, focus resets on each map change (automatic)

If cars suddenly stop after working fine:

Check windows aren’t minimized
Look for timeout messages in logs
Verify all instances show similar frame generation rates

Q: How many game instances should I use (specific numbers)? - Training throughput: batches per minute

Increase instances until you hit a bottleneck (usually CPU or RAM).

Q: Linux vs Windows performance?

A: On a system with Ryzen 7 5700G, 64GB RAM, RTX 4070 Ti:

Linux: ~58% faster training
Linux sweet spot: 4 collectors
Windows sweet spot: 2 collectors

Linux is recommended for serious training due to better wine performance with DXVK.

Q: My training is slow, what can I do?

A: Optimization checklist:

Increase gpu_collectors_count (if RAM/CPU allows)
Increase running_speed to 160-200x
Lower game resolution and graphics quality
Reduce batch_size for more frequent updates
Disable visualization options in the performance section of the config YAML
Close unnecessary background applications

Training

Q: My agent isn’t learning / getting stuck

A: Common causes and fixes:

Bad reference line: Verify VCP file covers entire track
Too low exploration: Increase epsilon in early training
Wrong map path: Check map loads correctly
Timeout too short: Increase cutoff_rollout_if_no_vcp_passed_within_duration_ms
Reward imbalance: Check time penalty vs progress reward ratio

Q: Can I train on multiple maps simultaneously?

A: Yes! Edit the map_cycle.entries section in your config YAML to alternate between maps:

map_cycle:
  entries:
    - {short_name: map1, map_path: "Map1.Gbx", reference_line_path: "map1_0.5m_cl.npy", is_exploration: true, fill_buffer: true, repeat: 4}
    - {short_name: map2, map_path: "Map2.Gbx", reference_line_path: "map2_0.5m_cl.npy", is_exploration: true, fill_buffer: true, repeat: 4}

Q: How do I resume training from a checkpoint?

A: Checkpoints are saved automatically in save/{run_name}/. To resume:

Ensure .torch files exist in save/{run_name}/
Keep the same run_name in the training section of your config
Run python scripts/train.py --config config_files/rl/config_default.yaml

Training will load the checkpoint automatically.

Q: Can I change hyperparameters during training?

A: Config is loaded once at startup. To change parameters you must edit the YAML file and restart training. A snapshot of the config used for each run is saved in save/{run_name}/config_snapshot.yaml.

⚠️ Don’t change: Network architecture, input dimensions, action space - these require restart.

Q: What’s the difference between exploration and evaluation runs?

A:

Exploration (is_explo=True): Agent uses epsilon-greedy + Boltzmann exploration to discover new strategies
Evaluation (is_explo=False): Agent plays greedily to measure current skill level

Typical ratio: 4 exploration runs per 1 evaluation run.

Maps & Replays

Q: How do I create virtual checkpoints for a new map?

A:

Drive the track manually and save a replay
Place replay in Documents/TrackMania/Tracks/Replays/
Run: python scripts/tools/gbx_to_vcp.py "path/to/replay.Replay.Gbx"
VCP file is saved to maps/ folder (e.g., MapName_0.5m_cl.npy)
Update the map_cycle.entries in your config YAML to reference the new VCP file

Q: Does the reference line need to be fast?

A: No! A slow centerline drive is perfectly fine. The reference line is only used to:

Track progress along the track
Define forward direction
Provide waypoint lookahead to the agent

The agent will learn to drive faster than the reference line.

Q: Can I use someone else’s world record replay?

A: Yes, but centerline is often better because:

WR lines may use advanced techniques (wallbangs, cuts)
Agent might struggle to discover these early in training
Centerline provides more uniform progress tracking

Q: My map won’t load / stuck on loading screen

A: Check:

Map file is NOT in OneDrive/cloud storage
Map path in config matches actual file location
Map file is valid (.Challenge.Gbx format)
Game is in windowed mode (not minimized)

Q: How do I replay agent runs?

A: Best runs are saved in save/{run_name}/best_runs/{map}_{time}/:

Copy .inputs file to Documents/TMInterface/Scripts/
Open game and load the map
Open TMInterface console (F12)
Type: load filename.inputs
Press Enter to play
Save replay if desired

Configuration

Q: Where do I find all configuration options?

A: Configuration is split across modules in config_files/:

Quick reference: Inline comments in each module
Full docs: Configuration Guide
Overview: config_files/README.md

Q: What’s the recommended configuration for my first training?

A: Default configuration is pre-tuned for ESL-Hockolicious. For other maps:

Easier than Hocko: Set global_schedule_speed = 0.8
Harder than Hocko: Set global_schedule_speed = 1.5
Very technical maps: Reduce tm_engine_step_per_action to 3-4

Q: What does global_schedule_speed do?

A: Multiplier for all frame-based schedules:

1.0: Normal speed (default)
0.8: 20% faster schedule (for easier maps)
1.5: 50% slower schedule (for harder maps)

This uniformly speeds up/slows down learning rate decay, epsilon decay, buffer growth, etc.

Monitoring

Q: What metrics should I watch in TensorBoard?

A: Metrics are organized into groups for easier navigation. Key metrics to monitor:

Training/ group: - Training/loss: Training loss (can increase early - normal for RL) - Training/loss_test: Test loss on held-out buffer - Training/learning_rate: Current learning rate (decays over time)

RL/ group: - RL/avg_Q: Expected reward (should increase after initial drop) - RL/single_zone_reached: How far agent drives (% of track completed) - RL/gamma: Discount factor (typically 0.999 → 1.0) - RL/epsilon: Exploration rate (decays from 1.0 to ~0.03)

Race/ group: - Race/eval_race_time_robust: Best evaluation times (most important performance metric) - Race/explo_race_time_finished: Exploration run times (more variable, includes exploration)

Gradients/ group: - Gradients/norm_median: Median gradient norm after clipping (should be stable) - Gradients/norm_before_clip_max: Maximum gradient norm BEFORE clipping (watch for explosions >100) - Gradients/by_layer/: Per-layer gradient norms (useful for debugging)

Performance/ group: - Performance/transitions_learned_per_second: Training throughput - Performance/learner_percentage_training: % time spent training (should be high)

Buffer/ group: - Buffer/size: Current replay buffer size - Buffer/priorities_median: Median priority (if using prioritized replay)

IQN/ group (IQN-specific): - IQN/quantile_std_action_X: Standard deviation of quantile predictions per action (measures uncertainty)

Q: Why is my loss increasing?

A: In RL, loss increasing early in training is normal and expected! It means:

Agent is discovering the environment
Identifying inconsistencies in its value estimates
Learning is progressing correctly

Loss should stabilize or decrease after ~1-2M frames.

Q: My agent finishes the track but times aren’t improving

A: This is the “optimization phase” (after ~3-5M frames):

Progress is slower now
Agent is refining strategy details
Continue training for 10-20M more frames
Consider enabling shaped rewards for faster progress

Q: How are TensorBoard metrics organized?

A: All metrics are grouped into categories using prefixes. This makes navigation easier:

Training/ - Training process metrics: - Training/loss - Training loss (can increase early - normal in RL) - Training/loss_test - Test loss on held-out buffer - Training/learning_rate - Current learning rate (decays according to schedule) - Training/weight_decay - L2 regularization strength - Training/batch_size - Batch size used for training - Training/n_steps - N-step return horizon - Training/train_on_batch_duration - Time per training batch

Gradients/ - Gradient monitoring (critical for stability): - Gradients/norm_median, Gradients/norm_d9, Gradients/norm_max - Gradient norms AFTER clipping (should be stable, typically <30) - Gradients/norm_before_clip_max - Watch this! Maximum gradient norm BEFORE clipping. Values >100 indicate gradient explosions. Should typically be <50. - Gradients/by_layer/{layer_name}/L2_* - Per-layer gradient L2 norms (useful for debugging which layers have issues) - Gradients/by_layer/{layer_name}/Linf_* - Per-layer gradient max norms

RL/ - Reinforcement learning hyperparameters and metrics: - RL/avg_Q - Expected future reward (key learning indicator, should increase after initial drop) - RL/single_zone_reached - How far agent drives (% of track completed) - RL/gamma - Discount factor (typically 0.999 → 1.0) - RL/epsilon - Epsilon-greedy exploration rate (decays from 1.0 to ~0.03) - RL/epsilon_boltzmann - Boltzmann exploration temperature - RL/tau_epsilon_boltzmann - Boltzmann tau parameter

Race/ - Race performance metrics: - Race/eval_race_time_robust - Most important! Best evaluation times (greedy policy, no exploration) - Race/explo_race_time_finished - Exploration run times (includes exploration, more variable) - Race/race_time_ratio_* - Race time relative to rollout duration - Race/split_* - Split times between checkpoints

Performance/ - System performance metrics: - Performance/transitions_learned_per_second - Training throughput - Performance/learner_percentage_training - % time spent training (should be high, >70%) - Performance/learner_percentage_waiting_for_workers - % time waiting for data (should be low, <20%) - Performance/learner_percentage_testing - % time spent on test batches

Buffer/ - Replay buffer statistics: - Buffer/size - Current buffer size - Buffer/max_size - Maximum buffer capacity - Buffer/priorities_* - Priority statistics (if using prioritized replay)

Network/ - Neural network weights and optimizer state: - Network/weights/{layer_name}/L2 - L2 norm of layer weights - Network/optimizer/{layer_name}/adaptive_lr_L2 - Per-parameter adaptive learning rates (Adam/RAdam) - Network/optimizer/{layer_name}/exp_avg_L2 - First moment estimate (Adam/RAdam) - Network/optimizer/{layer_name}/exp_avg_sq_L2 - Second moment estimate (Adam/RAdam)

IQN/ - IQN-specific metrics (Implicit Quantile Network): - IQN/quantile_std_action_{i} - Standard deviation of quantile predictions per action. Higher values indicate more uncertainty in Q-value estimates. Useful for understanding model confidence.

Tips for using TensorBoard: - Use the “SCALARS” tab to filter by group prefix (e.g., type “Gradients/” to see all gradient metrics) - The “Custom Scalars” tab has pre-configured layouts for key metrics - Watch Gradients/norm_before_clip_max closely - sudden spikes indicate gradient explosions - RL/avg_Q should generally trend upward after initial exploration phase - Race/eval_race_time_robust is your primary performance metric - lower is better

Technical Issues

Q: FileNotFoundError: Python_Link.as

A: Copy the plugin:

# Windows
New-Item -ItemType Directory -Force -Path "$env:USERPROFILE\Documents\TMInterface\Plugins"
Copy-Item "trackmania_rl\tmi_interaction\Python_Link.as" "$env:USERPROFILE\Documents\TMInterface\Plugins\"

Q: Game stuck on login screen

A: TMNF account must be an online account. Create one through the game launcher without TMInterface.

Q: CUDA out of memory

A: Reduce memory usage:

Decrease batch_size in the training section
Decrease memory_size_schedule in the memory section
Reduce gpu_collectors_count in the performance section
Lower image resolution (w_downsized, h_downsized) in the neural_network section

Q: Game crashes / TMInterface connection lost

A: Common fixes:

Increase timeout_during_run_ms in the environment section
Reduce running_speed (< 200x)
Check TMLoader profile is correctly configured
Verify no firewall blocking TMInterface
Restart game instances (automatic every 12 hours)

Contributing

Q: How can I contribute to this project?

A: Contributions welcome:

Report issues and bugs
Share your training results
Improve documentation
Add new features or algorithms
Optimize performance

See DEVELOPMENT.md for development setup.

Q: Can I share my trained models?

A: Yes! Model weights are in save/{run_name}/weights*.torch. Share with the community to help others.

⚠️ Important: All AI runs are Tool Assisted and must NOT be submitted to official leaderboards.

Additional Resources

Documentation: online docs
Configuration Guide: Configuration Guide
Original Linesight: https://github.com/pb4git/linesight
TMInterface: https://donadigo.com/tminterface/
TMNF Exchange: https://tmnf.exchange/