============
User FAQ
============

General Questions
-----------------

**Q: What is distributional RL and why IQN?**

A: Distributional RL models the full distribution of returns rather than just expected values. IQN (Implicit Quantile Networks) is particularly effective for TrackMania because:

- Handles stochastic environments well
- Better exploration through uncertainty estimation
- More stable learning than standard DQN

**Q: How long does training take?**

A: Depends on the map complexity:

- Simple maps (A01-A05): 2-5M frames (~2-4 hours with 4 collectors)
- Medium maps (ESL-Hockolicious): 5-10M frames (~4-8 hours)
- Complex maps (map5, Endurance): 15-30M frames (~12-24 hours)

**Q: What hardware do I need?**

A: Minimum recommended:

- CPU: 4+ cores
- GPU: NVIDIA with 6GB+ VRAM (GTX 1060 or better)
- RAM: 20GB+ (16GB system + 4GB+ VRAM)
- Storage: 10GB+ for saves and tensorboard logs

**Q: Can I use AMD GPUs?**

A: PyTorch supports AMD GPUs via ROCm, but we haven't tested extensively. NVIDIA CUDA is recommended and well-tested.

**Q: Why do I need multiple game instances?**

A: Parallel game instances speed up data collection significantly. More instances = faster training, but requires more RAM/CPU.

Performance
-----------

**Q: How many game instances should I use?**

A: Start with 2-4 and monitor:

- CPU usage: should be 70-90% (not maxed)
- RAM usage: ~2GB per instance
- GPU usage: 80-95% is optimal

**Q: What about window focus with multiple instances?** 🆕

A: **Automatically handled!** The code manages window focus intelligently:

- Each game window gets focus once when loading a new map
- No "focus war" between instances (they work in background)
- Works correctly with 8+ parallel instances
- Minimal performance impact (<0.01%)

**Important:** Don't minimize game windows - the game pauses when minimized. You can place other windows on top instead.

**Q: Why are my cars not moving during training?**

A: Most likely causes:

1. **Windows minimized:** Game pauses when minimized (unminimize them)
2. **First-time setup:** Window focus is set automatically on first map load
3. **Map cycling:** With multiple maps, focus resets on each map change (automatic)

If cars suddenly stop after working fine:

- Check windows aren't minimized
- Look for timeout messages in logs
- Verify all instances show similar frame generation rates

**Q: How many game instances should I use (specific numbers)?**
- Training throughput: batches per minute

Increase instances until you hit a bottleneck (usually CPU or RAM).

**Q: Linux vs Windows performance?**

A: On a system with Ryzen 7 5700G, 64GB RAM, RTX 4070 Ti:

- Linux: ~58% faster training
- Linux sweet spot: 4 collectors
- Windows sweet spot: 2 collectors

Linux is recommended for serious training due to better wine performance with DXVK.

**Q: My training is slow, what can I do?**

A: Optimization checklist:

1. Increase ``gpu_collectors_count`` (if RAM/CPU allows)
2. Increase ``running_speed`` to 160-200x
3. Lower game resolution and graphics quality
4. Reduce ``batch_size`` for more frequent updates
5. Disable visualization options in the ``performance`` section of the config YAML
6. Close unnecessary background applications

Training
--------

**Q: My agent isn't learning / getting stuck**

A: Common causes and fixes:

- **Bad reference line**: Verify VCP file covers entire track
- **Too low exploration**: Increase epsilon in early training
- **Wrong map path**: Check map loads correctly
- **Timeout too short**: Increase ``cutoff_rollout_if_no_vcp_passed_within_duration_ms``
- **Reward imbalance**: Check time penalty vs progress reward ratio

**Q: Can I train on multiple maps simultaneously?**

A: Yes! Edit the ``map_cycle.entries`` section in your config YAML to alternate between maps:

.. code-block:: yaml

    map_cycle:
      entries:
        - {short_name: map1, map_path: "Map1.Gbx", reference_line_path: "map1_0.5m_cl.npy", is_exploration: true, fill_buffer: true, repeat: 4}
        - {short_name: map2, map_path: "Map2.Gbx", reference_line_path: "map2_0.5m_cl.npy", is_exploration: true, fill_buffer: true, repeat: 4}

**Q: How do I resume training from a checkpoint?**

A: Checkpoints are saved automatically in ``save/{run_name}/``. To resume:

1. Ensure ``.torch`` files exist in ``save/{run_name}/``
2. Keep the same ``run_name`` in the ``training`` section of your config
3. Run ``python scripts/train.py --config config_files/rl/config_default.yaml``

Training will load the checkpoint automatically.

**Q: Can I change hyperparameters during training?**

A: Config is loaded once at startup. To change parameters you must edit the YAML file and restart training. A snapshot of the config used for each run is saved in ``save/{run_name}/config_snapshot.yaml``.

⚠️ **Don't change**: Network architecture, input dimensions, action space - these require restart.

**Q: What's the difference between exploration and evaluation runs?**

A:

- **Exploration** (``is_explo=True``): Agent uses epsilon-greedy + Boltzmann exploration to discover new strategies
- **Evaluation** (``is_explo=False``): Agent plays greedily to measure current skill level

Typical ratio: 4 exploration runs per 1 evaluation run.

Maps & Replays
--------------

**Q: How do I create virtual checkpoints for a new map?**

A:

1. Drive the track manually and save a replay
2. Place replay in ``Documents/TrackMania/Tracks/Replays/``
3. Run: ``python scripts/tools/gbx_to_vcp.py "path/to/replay.Replay.Gbx"``
4. VCP file is saved to ``maps/`` folder (e.g., ``MapName_0.5m_cl.npy``)
5. Update the ``map_cycle.entries`` in your config YAML to reference the new VCP file

**Q: Does the reference line need to be fast?**

A: No! A slow centerline drive is perfectly fine. The reference line is only used to:

- Track progress along the track
- Define forward direction
- Provide waypoint lookahead to the agent

The agent will learn to drive faster than the reference line.

**Q: Can I use someone else's world record replay?**

A: Yes, but centerline is often better because:

- WR lines may use advanced techniques (wallbangs, cuts)
- Agent might struggle to discover these early in training
- Centerline provides more uniform progress tracking

**Q: My map won't load / stuck on loading screen**

A: Check:

- Map file is NOT in OneDrive/cloud storage
- Map path in config matches actual file location
- Map file is valid (``.Challenge.Gbx`` format)
- Game is in windowed mode (not minimized)

**Q: How do I replay agent runs?**

A: Best runs are saved in ``save/{run_name}/best_runs/{map}_{time}/``:

1. Copy ``.inputs`` file to ``Documents/TMInterface/Scripts/``
2. Open game and load the map
3. Open TMInterface console (F12)
4. Type: ``load filename.inputs``
5. Press Enter to play
6. Save replay if desired

Configuration
-------------

**Q: Where do I find all configuration options?**

A: Configuration is split across modules in ``config_files/``:

- Quick reference: Inline comments in each module
- Full docs: :doc:`configuration_guide`
- Overview: ``config_files/README.md``

**Q: What's the recommended configuration for my first training?**

A: Default configuration is pre-tuned for ESL-Hockolicious. For other maps:

- **Easier than Hocko**: Set ``global_schedule_speed = 0.8``
- **Harder than Hocko**: Set ``global_schedule_speed = 1.5``
- **Very technical maps**: Reduce ``tm_engine_step_per_action`` to 3-4

**Q: What does global_schedule_speed do?**

A: Multiplier for all frame-based schedules:

- ``1.0``: Normal speed (default)
- ``0.8``: 20% faster schedule (for easier maps)
- ``1.5``: 50% slower schedule (for harder maps)

This uniformly speeds up/slows down learning rate decay, epsilon decay, buffer growth, etc.

Monitoring
----------

**Q: What metrics should I watch in TensorBoard?**

A: Metrics are organized into groups for easier navigation. Key metrics to monitor:

**Training/** group:
- **Training/loss**: Training loss (can increase early - normal for RL)
- **Training/loss_test**: Test loss on held-out buffer
- **Training/learning_rate**: Current learning rate (decays over time)

**RL/** group:
- **RL/avg_Q**: Expected reward (should increase after initial drop)
- **RL/single_zone_reached**: How far agent drives (% of track completed)
- **RL/gamma**: Discount factor (typically 0.999 → 1.0)
- **RL/epsilon**: Exploration rate (decays from 1.0 to ~0.03)

**Race/** group:
- **Race/eval_race_time_robust**: Best evaluation times (most important performance metric)
- **Race/explo_race_time_finished**: Exploration run times (more variable, includes exploration)

**Gradients/** group:
- **Gradients/norm_median**: Median gradient norm after clipping (should be stable)
- **Gradients/norm_before_clip_max**: Maximum gradient norm BEFORE clipping (watch for explosions >100)
- **Gradients/by_layer/**: Per-layer gradient norms (useful for debugging)

**Performance/** group:
- **Performance/transitions_learned_per_second**: Training throughput
- **Performance/learner_percentage_training**: % time spent training (should be high)

**Buffer/** group:
- **Buffer/size**: Current replay buffer size
- **Buffer/priorities_median**: Median priority (if using prioritized replay)

**IQN/** group (IQN-specific):
- **IQN/quantile_std_action_X**: Standard deviation of quantile predictions per action (measures uncertainty)

**Q: Why is my loss increasing?**

A: In RL, loss increasing early in training is normal and expected! It means:

- Agent is discovering the environment
- Identifying inconsistencies in its value estimates
- Learning is progressing correctly

Loss should stabilize or decrease after ~1-2M frames.

**Q: My agent finishes the track but times aren't improving**

A: This is the "optimization phase" (after ~3-5M frames):

- Progress is slower now
- Agent is refining strategy details
- Continue training for 10-20M more frames
- Consider enabling shaped rewards for faster progress

**Q: How are TensorBoard metrics organized?**

A: All metrics are grouped into categories using prefixes. This makes navigation easier:

**Training/** - Training process metrics:
- ``Training/loss`` - Training loss (can increase early - normal in RL)
- ``Training/loss_test`` - Test loss on held-out buffer
- ``Training/learning_rate`` - Current learning rate (decays according to schedule)
- ``Training/weight_decay`` - L2 regularization strength
- ``Training/batch_size`` - Batch size used for training
- ``Training/n_steps`` - N-step return horizon
- ``Training/train_on_batch_duration`` - Time per training batch

**Gradients/** - Gradient monitoring (critical for stability):
- ``Gradients/norm_median``, ``Gradients/norm_d9``, ``Gradients/norm_max`` - Gradient norms AFTER clipping (should be stable, typically <30)
- ``Gradients/norm_before_clip_max`` - **Watch this!** Maximum gradient norm BEFORE clipping. Values >100 indicate gradient explosions. Should typically be <50.
- ``Gradients/by_layer/{layer_name}/L2_*`` - Per-layer gradient L2 norms (useful for debugging which layers have issues)
- ``Gradients/by_layer/{layer_name}/Linf_*`` - Per-layer gradient max norms

**RL/** - Reinforcement learning hyperparameters and metrics:
- ``RL/avg_Q`` - Expected future reward (key learning indicator, should increase after initial drop)
- ``RL/single_zone_reached`` - How far agent drives (% of track completed)
- ``RL/gamma`` - Discount factor (typically 0.999 → 1.0)
- ``RL/epsilon`` - Epsilon-greedy exploration rate (decays from 1.0 to ~0.03)
- ``RL/epsilon_boltzmann`` - Boltzmann exploration temperature
- ``RL/tau_epsilon_boltzmann`` - Boltzmann tau parameter

**Race/** - Race performance metrics:
- ``Race/eval_race_time_robust`` - **Most important!** Best evaluation times (greedy policy, no exploration)
- ``Race/explo_race_time_finished`` - Exploration run times (includes exploration, more variable)
- ``Race/race_time_ratio_*`` - Race time relative to rollout duration
- ``Race/split_*`` - Split times between checkpoints

**Performance/** - System performance metrics:
- ``Performance/transitions_learned_per_second`` - Training throughput
- ``Performance/learner_percentage_training`` - % time spent training (should be high, >70%)
- ``Performance/learner_percentage_waiting_for_workers`` - % time waiting for data (should be low, <20%)
- ``Performance/learner_percentage_testing`` - % time spent on test batches

**Buffer/** - Replay buffer statistics:
- ``Buffer/size`` - Current buffer size
- ``Buffer/max_size`` - Maximum buffer capacity
- ``Buffer/priorities_*`` - Priority statistics (if using prioritized replay)

**Network/** - Neural network weights and optimizer state:
- ``Network/weights/{layer_name}/L2`` - L2 norm of layer weights
- ``Network/optimizer/{layer_name}/adaptive_lr_L2`` - Per-parameter adaptive learning rates (Adam/RAdam)
- ``Network/optimizer/{layer_name}/exp_avg_L2`` - First moment estimate (Adam/RAdam)
- ``Network/optimizer/{layer_name}/exp_avg_sq_L2`` - Second moment estimate (Adam/RAdam)

**IQN/** - IQN-specific metrics (Implicit Quantile Network):
- ``IQN/quantile_std_action_{i}`` - Standard deviation of quantile predictions per action. Higher values indicate more uncertainty in Q-value estimates. Useful for understanding model confidence.

**Tips for using TensorBoard:**
- Use the "SCALARS" tab to filter by group prefix (e.g., type "Gradients/" to see all gradient metrics)
- The "Custom Scalars" tab has pre-configured layouts for key metrics
- Watch ``Gradients/norm_before_clip_max`` closely - sudden spikes indicate gradient explosions
- ``RL/avg_Q`` should generally trend upward after initial exploration phase
- ``Race/eval_race_time_robust`` is your primary performance metric - lower is better

Technical Issues
----------------

**Q: FileNotFoundError: Python_Link.as**

A: Copy the plugin:

.. code-block:: bash

    # Windows
    New-Item -ItemType Directory -Force -Path "$env:USERPROFILE\Documents\TMInterface\Plugins"
    Copy-Item "trackmania_rl\tmi_interaction\Python_Link.as" "$env:USERPROFILE\Documents\TMInterface\Plugins\"

**Q: Game stuck on login screen**

A: TMNF account must be an **online account**. Create one through the game launcher without TMInterface.

**Q: CUDA out of memory**

A: Reduce memory usage:

- Decrease ``batch_size`` in the ``training`` section
- Decrease ``memory_size_schedule`` in the ``memory`` section
- Reduce ``gpu_collectors_count`` in the ``performance`` section
- Lower image resolution (``w_downsized``, ``h_downsized``) in the ``neural_network`` section

**Q: Game crashes / TMInterface connection lost**

A: Common fixes:

- Increase ``timeout_during_run_ms`` in the ``environment`` section
- Reduce ``running_speed`` (< 200x)
- Check TMLoader profile is correctly configured
- Verify no firewall blocking TMInterface
- Restart game instances (automatic every 12 hours)

Contributing
------------

**Q: How can I contribute to this project?**

A: Contributions welcome:

- Report issues and bugs
- Share your training results
- Improve documentation
- Add new features or algorithms
- Optimize performance

See ``DEVELOPMENT.md`` for development setup.

**Q: Can I share my trained models?**

A: Yes! Model weights are in ``save/{run_name}/weights*.torch``. Share with the community to help others.

⚠️ **Important**: All AI runs are Tool Assisted and must NOT be submitted to official leaderboards.

Additional Resources
--------------------

- **Documentation**: `online docs <https://artyomzemlyak.github.io/rulka/>`_
- **Configuration Guide**: :doc:`configuration_guide`
- **Original Linesight**: https://github.com/pb4git/linesight
- **TMInterface**: https://donadigo.com/tminterface/
- **TMNF Exchange**: https://tmnf.exchange/