Experiments
This section documents various experiments conducted in the project, their results, and conclusions.
Analysis — time axis: Use scripts/analyze_experiment_by_relative_time.py with two or more runs (e.g. uni_5 uni_7). Default --time-axis is auto: the script uses cumulative training hours (TensorBoard scalar cumul_training_hours, same idea as console Training hours) when every run logs it; otherwise it falls back to wall-clock minutes from the earliest TensorBoard wall_time in the merged run (includes idle gaps between restarts).
For any write-up, do not describe wall-span minutes as “training minutes” when the learner was stopped between sessions or logs are split across several TB folders. Check gaps with python scripts/audit_tensorboard_training_timeline.py (optional --runs …). Short single-session uni_* runs usually have wall time close to active training time; long A01 runs with suffix merges often do not.
The script prints per-race tables (best/mean/std, finish rate, first finish) from Race/eval_race_time_* / Race/explo_race_time_*, then scalar metrics (alltime_min_ms_*, loss, Q, GPU %). BY STEP tables compare equal environment steps regardless of wall time. For runs logged before the learner fix, prefer per-race tables for race-time comparison.
Comparison plots: Each experiment page embeds JPG graphs (one metric per graph, runs as lines) next to the metric they illustrate in “Detailed TensorBoard Metrics Analysis”. Each image has an alt text (caption) describing the metric and runs. The image files (exp_*.jpg in docs/source/_static/) are generated by running python scripts/generate_experiment_plots.py (with TensorBoard logs present, e.g. tensorboard/uni_12) and should be committed so the built docs include the plots. Use the project venv; if activation fails, run .venv\Scripts\python.exe scripts/generate_experiment_plots.py (Windows).
Contents
- Time axis conventions (experiment write-ups)
- Batch Size and Running Speed
- Engineered rewards (speedslide, neoslide)
- Experiment Overview
- Results
- Run Analysis
- Detailed TensorBoard Metrics Analysis
- Configuration Changes
- Hardware
- Conclusions
- Recommendations
- BC full IQN resume with engineered rewards (A01_as20_long_full_iqn_bc_3_resume_engineer_rewards)
- BC full IQN resume: v2 vs v3 (engineered reward coefficients)
- Overall conclusions: engineered rewards and 4 explo / 4 eval
- Extended Training, One vs Two Maps
- Temporal Mini-Race Duration
- Epsilon-Greedy Exploration
- Experiment: Global Schedule Speed (A01 Long v2 Series)
- Experiment: Linesight-style RL on A01 vs
A01_as20_long_v2 - Experiment: PPO Smoke Run vs IQN Baseline (A01_as20_long_v2)
- Experiment: Multi-action Offset Training (A01_as20_long v3 series)
- Experiment Note: IQN Modernization Plan
- Network Size and Long Training
- Replay pretrain roadmap
- Visual backbone pretraining
- BC pretraining
- Pretrain
- Experiment: IQN Without Image Head (Float-Only)
- IQN model experiments
- Experiment 1: Double DQN (use_ddqn)
- Experiment 2: iqn_embedding_dimension (128 vs 64)
- Experiment 3: Image dimensions (W_downsized / H_downsized 256 vs 128)
- Experiment 4: Image dimensions 64×64 vs 128×128 (downsized model)
- Experiment 5: Image dimensions 64×64 vs 128×128 (embedding 128 — isolates resolution)
- Analysis tools (all IQN experiments)
- Experiment: BTR (Beyond The Rainbow) on A01