TensorBoard Metrics Reference
==============================

This document provides a comprehensive guide to all metrics logged to TensorBoard during training. Metrics are organized into groups for easier navigation.

Overview
--------

All metrics are logged with prefixes that group them into categories:
- ``Training/`` - Training process metrics
- ``RL/`` - Reinforcement learning metrics
- ``Race/`` - Race performance metrics
- ``Gradients/`` - Gradient monitoring
- ``Performance/`` - System performance metrics
- ``Buffer/`` - Replay buffer statistics
- ``Network/`` - Neural network weights and optimizer state
- ``IQN/`` - IQN-specific metrics

Training Metrics
----------------

**Training/loss**
    **Description**: Training loss computed on batches from the replay buffer.
    
    **Interpretation**:
    - In reinforcement learning, loss increasing early in training is **normal and expected**
    - This indicates the agent is discovering the environment and identifying inconsistencies in its value estimates
    - Loss should stabilize or decrease after ~1-2M frames
    - Values typically range from 0.01 to 10.0
    
    **What to watch for**:
    - Sudden spikes (>100) may indicate gradient explosions
    - Consistently increasing loss after 5M+ frames may indicate learning issues

**Training/loss_test**
    **Description**: Test loss computed on held-out test buffer (not used for training).
    
    **Interpretation**:
    - Should track training loss but be slightly higher
    - Large gap between training and test loss indicates overfitting
    - Useful for detecting when the model is memorizing rather than generalizing
    
    **What to watch for**:
    - Test loss much higher than training loss (>2x) suggests overfitting
    - Test loss decreasing while training loss increases suggests good generalization

**Training/learning_rate**
    **Description**: Current learning rate used by the optimizer.
    
    **Interpretation**:
    - Decays according to the learning rate schedule
    - Typical range: 1e-5 to 1e-3
    - Lower learning rates in later training allow fine-tuning
    
    **What to watch for**:
    - Should decrease smoothly over time
    - Abrupt changes indicate schedule issues

**Training/weight_decay**
    **Description**: L2 regularization strength (weight decay coefficient).
    
    **Interpretation**:
    - Prevents overfitting by penalizing large weights
    - Typically proportional to learning rate
    - Range: 1e-7 to 1e-5
    
    **What to watch for**:
    - Should track learning rate if using proportional weight decay
    - Too high values can prevent learning

**Training/batch_size**
    **Description**: Number of transitions sampled per training batch.
    
    **Interpretation**:
    - Larger batches provide more stable gradients but slower updates
    - Typical values: 32, 64, 128, 256
    
    **What to watch for**:
    - Should remain constant unless explicitly changed in config

**Training/n_steps**
    **Description**: N-step return horizon for bootstrapping.
    
    **Interpretation**:
    - Number of steps used in n-step returns
    - Higher values reduce bias but increase variance
    - Typical range: 1-5
    
    **What to watch for**:
    - Should remain constant unless explicitly changed in config

**Training/discard_non_greedy_actions_in_nsteps**
    **Description**: Whether non-greedy (exploratory) actions are excluded from n-step returns.
    
    **Interpretation**:
    - 1.0 = True (only greedy actions in n-step backup)
    - 0.0 = False (all actions included)
    - Recommended: True to reduce exploration bias
    
    **What to watch for**:
    - Should remain constant unless explicitly changed in config

**Training/train_on_batch_duration**
    **Description**: Median time (in seconds) to process one training batch.
    
    **Interpretation**:
    - Lower is better (faster training)
    - Typical range: 0.01-0.1 seconds
    - Affected by GPU speed, batch size, and network complexity
    
    **What to watch for**:
    - Sudden increases may indicate GPU throttling or system issues
    - Should be relatively stable

RL Metrics
----------

**RL/avg_Q**
    **Description**: Average Q-value (expected future reward) predicted by the network.
    
    **Interpretation**:
    - **Key indicator of learning progress**
    - Starts near zero for untrained agent
    - Initially decreases as agent discovers it plays poorly
    - Should increase as agent learns better strategies
    - Higher values indicate agent expects more reward
    
    **What to watch for**:
    - Should trend upward after initial exploration phase (~500K-1M frames)
    - Plateaus indicate agent has learned current strategy
    - Decreasing values may indicate learning instability

**RL/single_zone_reached**
    **Description**: Furthest virtual checkpoint (zone) reached during a race, as percentage of track.
    
    **Interpretation**:
    - 0.0 = agent didn't start
    - 1.0 = agent finished the track
    - Shows how far agent progresses along the track
    
    **What to watch for**:
    - Should increase over time
    - Takes ~300K steps to learn to press forward
    - Takes ~500K steps to finish map for first time
    - Takes ~1M steps to regularly finish map
    - Plateaus indicate agent is stuck at certain sections

**RL/gamma**
    **Description**: Discount factor for future rewards.
    
    **Interpretation**:
    - Controls how much future rewards are valued
    - Range: 0.0 (only immediate reward) to 1.0 (all future rewards equally)
    - Typically increases from 0.999 to 1.0 during training
    - Higher values make agent plan further ahead
    
    **What to watch for**:
    - Should increase according to schedule
    - Too low values make agent short-sighted
    - Too high values (1.0) can cause instability

**RL/epsilon**
    **Description**: Epsilon-greedy exploration rate.
    
    **Interpretation**:
    - Probability of taking random action instead of greedy action
    - Decays from 1.0 (fully random) to ~0.03 (mostly greedy)
    - Higher values = more exploration
    - Lower values = more exploitation
    
    **What to watch for**:
    - Should decay smoothly according to schedule
    - Too fast decay = insufficient exploration
    - Too slow decay = agent doesn't exploit learned strategies

**RL/epsilon_boltzmann**
    **Description**: Boltzmann exploration temperature parameter.
    
    **Interpretation**:
    - Controls softmax temperature for action selection
    - Higher values = more uniform action distribution (more exploration)
    - Lower values = more peaked distribution (more exploitation)
    - Used in combination with epsilon-greedy
    
    **What to watch for**:
    - Should decay according to schedule
    - Works together with epsilon for exploration strategy

**RL/tau_epsilon_boltzmann**
    **Description**: Tau parameter for Boltzmann exploration.
    
    **Interpretation**:
    - Additional temperature parameter for IQN quantile sampling
    - Affects exploration in distributional RL setting
    - Typically constant value
    
    **What to watch for**:
    - Should remain constant unless explicitly changed

**RL/mean_action_gap**
    **Description**: Average difference between best Q-value and other Q-values per state.
    
    **Interpretation**:
    - Measures how confident the agent is in its action selection
    - Higher values = agent has clear preference for one action
    - Lower values = agent is uncertain between actions
    - Negative values are possible (computed as negative gap)
    
    **What to watch for**:
    - Should increase as agent learns (becomes more confident)
    - Very low values indicate high uncertainty

**RL/q_value_{i}_starting_frame**
    **Description**: Q-value for action {i} at the starting frame of a race.
    
    **Interpretation**:
    - Shows agent's expected reward for each action at race start
    - Useful for understanding initial action preferences
    - Typically logged for action 0 (forward)
    
    **What to watch for**:
    - Should increase as agent learns
    - Can reveal if agent has learned good starting strategy

Race Metrics
------------

**Race/eval_race_time_robust**
    **Description**: **Most important performance metric!** Best evaluation race times (greedy policy, no exploration).
    
    **Interpretation**:
    - Time in seconds for evaluation runs that finished within 2% of rolling mean
    - Only includes "robust" runs (consistent performance)
    - Lower is better
    - This is the primary metric to track for agent performance
    
    **What to watch for**:
    - Should decrease over time (agent getting faster)
    - Plateaus indicate agent has learned current strategy
    - Compare with reference times (author/gold) if available
    - Most reliable indicator of actual performance

**Race/eval_race_time_{status}_{map}**
    **Description**: Evaluation race time for specific map and status.
    
    **Interpretation**:
    - Time in seconds for evaluation runs
    - Includes all evaluation runs (not just robust ones)
    - More variable than robust times
    - Status indicates run quality (e.g., "finished", "dnf")
    
    **What to watch for**:
    - More noisy than robust times
    - Useful for tracking completion rates

**Race/explo_race_time_finished**
    **Description**: Exploration race times for runs that finished.
    
    **Interpretation**:
    - Time in seconds for exploration runs that completed the track
    - Includes exploration, so more variable than evaluation times
    - Higher than evaluation times (exploration slows agent down)
    
    **What to watch for**:
    - Should trend downward but be more noisy
    - Useful for tracking exploration progress
    - Large gap with eval times indicates exploration is working

**Race/explo_race_time_{status}_{map}**
    **Description**: Exploration race time for specific map and status.
    
    **Interpretation**:
    - Time in seconds for exploration runs
    - Includes all exploration runs
    - More variable due to exploration
    
    **What to watch for**:
    - More noisy than finished times
    - Useful for understanding exploration behavior

**Race/eval_race_finished_{status}_{map}**
    **Description**: Whether evaluation race finished (1.0) or not (0.0).
    
    **Interpretation**:
    - Binary metric: 1.0 = finished, 0.0 = did not finish
    - Shows completion rate for evaluation runs
    - Should approach 1.0 as agent learns
    
    **What to watch for**:
    - Should increase to 1.0 as training progresses
    - Persistent 0.0 values indicate agent is stuck

**Race/explo_race_finished_{status}_{map}**
    **Description**: Whether exploration race finished (1.0) or not (0.0).
    
    **Interpretation**:
    - Binary metric: 1.0 = finished, 0.0 = did not finish
    - Shows completion rate for exploration runs
    - May be lower than eval completion rate
    
    **What to watch for**:
    - Should increase over time
    - Lower than eval rate is normal (exploration can cause crashes)

**Race/race_time_ratio_{map}**
    **Description**: Ratio of race time to total rollout duration.
    
    **Interpretation**:
    - Shows efficiency: how much of rollout time was spent racing
    - Values < 1.0 indicate time spent on loading, setup, etc.
    - Higher values = more efficient data collection
    
    **What to watch for**:
    - Should be relatively stable
    - Very low values indicate system overhead issues

**Race/split_{map}_{i}**
    **Description**: Time (in seconds) between checkpoint i and checkpoint i+1.
    
    **Interpretation**:
    - Shows performance on specific track segments
    - Useful for identifying which parts of track are slow
    - Only logged for evaluation runs
    
    **What to watch for**:
    - Should decrease over time for all splits
    - Large differences between splits indicate difficult sections
    - Useful for track-specific analysis

**Race/eval_ratio_{status}_{reference}_{map}**
    **Description**: Race time as percentage of reference time (author or gold).
    
    **Interpretation**:
    - 100% = matched reference time
    - <100% = faster than reference (rare, indicates very good performance)
    - >100% = slower than reference
    - Useful for comparing to human performance
    
    **What to watch for**:
    - Should decrease over time (approaching 100% or below)
    - Only available if reference times are configured

**Race/eval_agg_ratio_{status}_{reference}**
    **Description**: Aggregated ratio across all maps.
    
    **Interpretation**:
    - Average ratio across all maps with reference times
    - Useful for multi-map training
    
    **What to watch for**:
    - Should decrease over time
    - Only available if reference times are configured

Gradient Metrics
---------------

**Gradients/norm_median**
    **Description**: Median gradient norm after clipping.
    
    **Interpretation**:
    - Should be stable (typically <30)
    - Shows typical gradient magnitude
    - Stable values indicate healthy training
    
    **What to watch for**:
    - Should remain relatively constant
    - Sudden changes may indicate learning issues

**Gradients/norm_q1, norm_q3**
    **Description**: 25th and 75th percentile gradient norms after clipping.
    
    **Interpretation**:
    - Shows distribution of gradient magnitudes
    - Q1-Q3 range shows typical gradient spread
    - Useful for understanding gradient stability
    
    **What to watch for**:
    - Should be relatively stable
    - Large spread may indicate unstable gradients

**Gradients/norm_d9, norm_d98**
    **Description**: 90th and 98th percentile gradient norms after clipping.
    
    **Interpretation**:
    - Shows tail of gradient distribution
    - Higher percentiles reveal occasional large gradients
    - Useful for detecting outliers
    
    **What to watch for**:
    - Should be stable
    - Large values may indicate occasional gradient spikes

**Gradients/norm_max**
    **Description**: Maximum gradient norm after clipping.
    
    **Interpretation**:
    - Maximum gradient magnitude encountered
    - After clipping, should be bounded by clip value
    - Typical range: 10-50
    
    **What to watch for**:
    - Should be relatively stable
    - Consistently hitting clip value may indicate need for higher clip threshold

**Gradients/norm_before_clip_median**
    **Description**: Median gradient norm BEFORE clipping.
    
    **Interpretation**:
    - Shows typical gradient magnitude before clipping
    - Should be similar to after-clip median if clipping is not active
    - Useful for understanding if clipping is necessary
    
    **What to watch for**:
    - Should be stable
    - Much higher than after-clip indicates clipping is active

**Gradients/norm_before_clip_max**
    **Description**: **CRITICAL METRIC!** Maximum gradient norm BEFORE clipping.
    
    **Interpretation**:
    - **Watch this closely!** Values >100 indicate gradient explosions
    - Should typically be <50
    - Sudden spikes indicate training instability
    - Used to detect gradient explosion before clipping fixes it
    
    **What to watch for**:
    - **Most important gradient metric**
    - Values >100 = gradient explosion (bad!)
    - Values >200 = severe gradient explosion
    - Sudden spikes require investigation
    - Should be relatively stable

**Gradients/norm_before_clip_q1, q3, d9, d98**
    **Description**: Percentile gradient norms before clipping.
    
    **Interpretation**:
    - Shows distribution of unclipped gradients
    - Useful for understanding gradient behavior before clipping
    - Similar interpretation to after-clip percentiles
    
    **What to watch for**:
    - Should be stable
    - Large values indicate need for gradient clipping

**Gradients/by_layer/{layer_name}/L2_median, q3, d9, max**
    **Description**: Per-layer L2 gradient norms (Euclidean norm).
    
    **Interpretation**:
    - Shows gradient magnitude for each network layer
    - Useful for debugging which layers have gradient issues
    - L2 norm = sqrt(sum of squared gradients)
    
    **What to watch for**:
    - Some layers may have naturally larger gradients
    - Sudden spikes in specific layers indicate layer-specific issues
    - Useful for identifying problematic layers

**Gradients/by_layer/{layer_name}/Linf_median, q3, d9, max**
    **Description**: Per-layer Linf gradient norms (maximum absolute value).
    
    **Interpretation**:
    - Shows maximum gradient component for each layer
    - Useful for detecting individual parameter issues
    - Linf norm = max absolute gradient value
    
    **What to watch for**:
    - Can reveal issues in specific parameters
    - Large Linf with small L2 indicates sparse large gradients

Performance Metrics
-------------------

**Performance/transitions_learned_per_second**
    **Description**: Training throughput - number of transitions processed per second.
    
    **Interpretation**:
    - Higher is better (faster training)
    - Typical range: 100-1000 transitions/second
    - Affected by GPU speed, batch size, and system performance
    
    **What to watch for**:
    - Should be relatively stable
    - Sudden decreases may indicate system issues
    - Higher values = faster training progress

**Performance/learner_percentage_training**
    **Description**: Percentage of time learner process spends on training (vs waiting).
    
    **Interpretation**:
    - Should be high (>70%) for efficient training
    - Low values indicate learner is waiting for data
    - High values indicate good data collection rate
    
    **What to watch for**:
    - Should be >70% for efficient training
    - <50% indicates workers are too slow
    - 100% indicates perfect balance (rare)

**Performance/learner_percentage_waiting_for_workers**
    **Description**: Percentage of time learner process waits for worker data.
    
    **Interpretation**:
    - Should be low (<20%) for efficient training
    - High values indicate workers are too slow
    - Indicates data collection bottleneck
    
    **What to watch for**:
    - Should be <20% for efficient training
    - >50% indicates severe data collection bottleneck
    - May need more worker instances or faster workers

**Performance/learner_percentage_testing**
    **Description**: Percentage of time spent on test batches.
    
    **Interpretation**:
    - Typically small (<10%)
    - Time spent evaluating on test buffer
    - Useful for monitoring but not critical
    
    **What to watch for**:
    - Should be relatively small
    - Large values may indicate too much testing

**Performance/instrumentation__answer_normal_step**
    **Description**: Time spent in normal step processing (microseconds).
    
    **Interpretation**:
    - Low-level performance metric
    - Shows TMInterface communication overhead
    - Useful for debugging performance issues
    
    **What to watch for**:
    - Should be relatively stable
    - Sudden increases may indicate system issues

**Performance/instrumentation__answer_action_step**
    **Description**: Time spent in action step processing (microseconds).
    
    **Interpretation**:
    - Low-level performance metric
    - Shows action processing time
    - Useful for debugging performance issues
    
    **What to watch for**:
    - Should be relatively stable
    - Affects overall training speed

**Performance/instrumentation__between_run_steps**
    **Description**: Time spent between runs (microseconds).
    
    **Interpretation**:
    - Low-level performance metric
    - Shows overhead between race restarts
    - Includes map loading, reset, etc.
    
    **What to watch for**:
    - Should be relatively stable
    - Large values indicate slow map loading

**Performance/instrumentation__grab_frame**
    **Description**: Time spent grabbing frame from game (microseconds).
    
    **Interpretation**:
    - Low-level performance metric
    - Shows frame capture overhead
    - Affected by game rendering speed
    
    **What to watch for**:
    - Should be relatively stable
    - Large values may indicate rendering issues

**Performance/instrumentation__convert_frame**
    **Description**: Time spent converting frame format (microseconds).
    
    **Interpretation**:
    - Low-level performance metric
    - Shows image processing overhead
    - Affected by image resolution and format
    
    **What to watch for**:
    - Should be relatively stable
    - Can be optimized by reducing resolution

**Performance/instrumentation__grab_floats**
    **Description**: Time spent grabbing float data from game (microseconds).
    
    **Interpretation**:
    - Low-level performance metric
    - Shows data extraction overhead
    - Includes speed, position, etc.
    
    **What to watch for**:
    - Should be relatively stable
    - Typically very fast

**Performance/instrumentation__exploration_policy**
    **Description**: Time spent in exploration policy computation (microseconds).
    
    **Interpretation**:
    - Low-level performance metric
    - Shows action selection overhead
    - Includes Q-value computation and exploration
    
    **What to watch for**:
    - Should be relatively stable
    - Affected by network inference speed

**Performance/instrumentation__request_inputs_and_speed**
    **Description**: Time spent requesting inputs and speed from game (microseconds).
    
    **Interpretation**:
    - Low-level performance metric
    - Shows game communication overhead
    - Includes TMInterface API calls
    
    **What to watch for**:
    - Should be relatively stable
    - Large values may indicate communication issues

**Performance/tmi_protection_cutoff**
    **Description**: Number of times TMI protection cutoff was triggered.
    
    **Interpretation**:
    - Safety mechanism to prevent infinite loops
    - High values indicate agent is getting stuck frequently
    - Should be low for well-trained agent
    
    **What to watch for**:
    - Should decrease as agent learns
    - High values indicate learning issues
    - May need to adjust timeout settings

**Performance/worker_time_in_rollout_percentage**
    **Description**: Percentage of rollout time spent in worker processing.
    
    **Interpretation**:
    - Shows worker efficiency
    - Higher values = workers are busy (good)
    - Lower values = workers are waiting (bad)
    
    **What to watch for**:
    - Should be relatively high (>80%)
    - Low values indicate worker bottlenecks

Buffer Metrics
--------------

**Buffer/size**
    **Description**: Current number of transitions in replay buffer.
    
    **Interpretation**:
    - Grows from 0 to max_size during training
    - More transitions = more diverse training data
    - Typical range: 20K to 200K
    
    **What to watch for**:
    - Should increase until reaching max_size
    - Should remain at max_size once full
    - Sudden decreases may indicate buffer issues

**Buffer/max_size**
    **Description**: Maximum capacity of replay buffer.
    
    **Interpretation**:
    - Set by memory_size_schedule
    - Larger buffers = more memory but better diversity
    - Typical range: 50K to 200K
    
    **What to watch for**:
    - Should remain constant unless schedule changes
    - Changes according to memory_size_schedule

**Buffer/number_times_single_memory_is_used_before_discard**
    **Description**: How many times each transition is used before being discarded.
    
    **Interpretation**:
    - Controls transition reuse
    - Higher values = transitions used more times
    - Balances data efficiency with freshness
    
    **What to watch for**:
    - Should remain constant unless explicitly changed
    - Typical values: 1-4

**Buffer/priorities_min, q1, mean, median, q3, d9, c98, max**
    **Description**: Priority statistics for prioritized experience replay.
    
    **Interpretation**:
    - Only available if using prioritized replay (prio_alpha > 0)
    - Higher priorities = more important transitions
    - Priorities based on TD error
    - Shows distribution of transition importance
    
    **What to watch for**:
    - Large spread indicates some transitions are much more important
    - Should be relatively stable
    - Not available if using uniform sampling (prio_alpha = 0)

Network Metrics
---------------

**Network/weights/{layer_name}/L2**
    **Description**: L2 norm (Euclidean norm) of layer weights.
    
    **Interpretation**:
    - Shows magnitude of weights in each layer
    - Useful for detecting weight growth or decay
    - Should be relatively stable during training
    
    **What to watch for**:
    - Sudden increases may indicate instability
    - Gradual growth is normal
    - Very large values may indicate numerical issues

**Network/optimizer/{layer_name}/adaptive_lr_L2**
    **Description**: L2 norm of per-parameter adaptive learning rates (Adam/RAdam).
    
    **Interpretation**:
    - Shows magnitude of adaptive learning rates
    - Adam/RAdam adjust learning rate per parameter
    - Higher values = larger effective learning rates
    
    **What to watch for**:
    - Should be relatively stable
    - Useful for understanding optimizer behavior

**Network/optimizer/{layer_name}/exp_avg_L2**
    **Description**: L2 norm of first moment estimate (Adam/RAdam).
    
    **Interpretation**:
    - First moment (moving average of gradients)
    - Used by Adam/RAdam for momentum
    - Should track gradient magnitudes
    
    **What to watch for**:
    - Should be relatively stable
    - Useful for debugging optimizer state

**Network/optimizer/{layer_name}/exp_avg_sq_L2**
    **Description**: L2 norm of second moment estimate (Adam/RAdam).
    
    **Interpretation**:
    - Second moment (moving average of squared gradients)
    - Used by Adam/RAdam for adaptive learning rates
    - Should track gradient variance
    
    **What to watch for**:
    - Should be relatively stable
    - Useful for debugging optimizer state

IQN Metrics
-----------

**IQN/quantile_std_action_{i}**
    **Description**: Standard deviation of quantile predictions for action {i}.
    
    **Interpretation**:
    - Measures uncertainty in Q-value estimates for each action
    - Higher values = more uncertainty (wider distribution)
    - Lower values = more confidence (narrower distribution)
    - IQN-specific metric (distributional RL)
    
    **What to watch for**:
    - Should decrease as agent learns (becomes more confident)
    - High values indicate high uncertainty
    - Useful for understanding model confidence
    - Different actions may have different uncertainty levels

Other Metrics
-------------

**alltime_min_ms_{map}**
    **Description**: All-time best race time (in milliseconds) for each map.
    
    **Interpretation**:
    - Best time ever achieved on each map
    - Only decreases (new records)
    - Most important performance metric alongside eval_race_time_robust
    
    **What to watch for**:
    - Should decrease over time (new records)
    - Plateaus indicate agent has reached current limit
    - Compare with reference times if available

**cumul_number_frames_played**
    **Description**: Cumulative number of frames processed during training.
    
    **Interpretation**:
    - Total training progress
    - Used as x-axis in most TensorBoard plots
    - Typical training: 1M to 50M+ frames
    
    **What to watch for**:
    - Should increase steadily
    - Used to track training progress

**cumul_number_batches_done**
    **Description**: Cumulative number of training batches processed.
    
    **Interpretation**:
    - Total number of gradient updates
    - Related to frames_played but depends on buffer fill rate
    - Higher = more learning steps
    
    **What to watch for**:
    - Should increase steadily
    - Ratio to frames_played shows learning frequency

**cumul_number_single_memories_used**
    **Description**: Cumulative number of transitions used for training.
    
    **Interpretation**:
    - Total transitions sampled from buffer
    - May be higher than frames_played due to reuse
    - Shows total learning experience
    
    **What to watch for**:
    - Should increase steadily
    - Higher than frames_played indicates transition reuse

**cumul_number_memories_generated**
    **Description**: Cumulative number of transitions generated from rollouts.
    
    **Interpretation**:
    - Total transitions added to buffer
    - Includes n-step transitions
    - Shows data collection progress
    
    **What to watch for**:
    - Should increase steadily
    - Should be less than memories_used (due to reuse)

**cumul_training_hours**
    **Description**: Cumulative training time in hours.
    
    **Interpretation**:
    - Total wall-clock time spent training
    - Useful for estimating training duration
    - Includes all overhead (not just GPU time)
    
    **What to watch for**:
    - Should increase steadily
    - Useful for planning training schedules

**cumul_number_target_network_updates**
    **Description**: Cumulative number of target network updates.
    
    **Interpretation**:
    - Number of times target network was updated
    - Target network updated less frequently than online network
    - Used for stable Q-learning
    
    **What to watch for**:
    - Should increase steadily
    - Frequency depends on update schedule

**times_summary** (Text)
    **Description**: Text summary of best times for all maps.
    
    **Interpretation**:
    - Human-readable summary of performance
    - Shows best times with timestamps
    - Updated every 5 minutes
    
    **What to watch for**:
    - Useful for quick overview
    - Shows new records with ** markers

Tips for Using TensorBoard
--------------------------

1. **Filtering**: Use the search box in TensorBoard to filter metrics by prefix (e.g., type "Gradients/" to see all gradient metrics)

2. **Custom Scalars**: The "Custom Scalars" tab has pre-configured layouts for key metrics grouped together

3. **Smoothing**: Use the smoothing slider to reduce noise in plots (helpful for noisy metrics)

4. **Comparison**: Load multiple runs to compare different training configurations

5. **Key Metrics to Monitor**:
   - ``Race/eval_race_time_robust`` - Primary performance metric
   - ``RL/avg_Q`` - Learning progress indicator
   - ``Gradients/norm_before_clip_max`` - Training stability
   - ``Training/loss`` - Learning quality
   - ``Performance/transitions_learned_per_second`` - Training efficiency

6. **Early Training** (0-3M frames):
   - Watch ``RL/single_zone_reached`` - should increase to 1.0
   - Watch ``RL/avg_Q`` - may decrease then increase
   - Watch ``Training/loss`` - may increase (normal!)

7. **Mid Training** (3-10M frames):
   - Watch ``Race/eval_race_time_robust`` - should decrease
   - Watch ``RL/avg_Q`` - should increase
   - Watch ``Training/loss`` - should stabilize

8. **Late Training** (10M+ frames):
   - Watch ``Race/eval_race_time_robust`` - slow improvements
   - Watch for plateaus - may need longer training or hyperparameter changes