TMNF replay download and frame capture

Download replays and maps from TMNF-X (ManiaExchange), then capture frames via TMInterface for visual pretraining or dataset building.

Pipeline: steps to run (in order)

Run the steps in order. All commands from the project root.

Step 1. Download the track list (once). Use 1000 or 10000 tracks per page (--per-page).

# Windows (from project root)
set PYTHONPATH=scripts & python -m replays_tmnf.download --list-popular --output maps/track_ids.txt --per-page 1000

# Linux / macOS
PYTHONPATH=scripts python -m replays_tmnf.download --list-popular --output maps/track_ids.txt --per-page 1000

Step 2. Download replays and extract maps (resumable pipeline; Ctrl+C saves progress).

# Windows
set PYTHONPATH=scripts & python -m replays_tmnf.download --track-ids maps/track_ids.txt --replays-dir ./maps/replays --tracks-dir ./maps/tracks --extract-tracks-from-replays --workers 64 --api-workers 64

# Linux / macOS
PYTHONPATH=scripts python -m replays_tmnf.download --track-ids maps/track_ids.txt --replays-dir ./maps/replays --tracks-dir ./maps/tracks --extract-tracks-from-replays --workers 256 --api-workers 32

After restart, step 2 continues from ./maps/replays/.replay_progress.

Step 3. Filter tracks with no respawn (keeps only tracks where replays do not respawn — for stable frame capture).

python scripts/filter_track_ids_no_respawn.py --input maps/track_ids.txt --output maps/track_ids_no_respawn.txt --workers 16 -r maps/replays

Step 3a. (Optional) Filter tracks with non-standard MapType or long preview (removes tracks with custom environments or “Press Enter to start” screens).

# Filter by environment (e.g., remove Stunts, custom MapType)
python scripts/filter_track_ids_custom_maptype.py --input maps/track_ids_no_respawn.txt --output maps/track_ids_standard.txt --tracks-dir maps/tracks --jobs 16

# (Future) Filter by MediaTracker preview duration (not yet implemented)
python scripts/filter_track_ids_custom_maptype.py --input maps/track_ids_no_respawn.txt --output maps/track_ids_standard.txt --tracks-dir maps/tracks --max-preview-duration 15.0 --jobs 16

This removes maps with non-standard environments (e.g., not Stadium/Speed/Alpine/Rally/Bay/Island/Coast/Desert) and, when implemented, maps with long MediaTracker intros. Use --only-with-maps to skip tracks without Challenge.Gbx files.

Step 3b. Fix replay filenames (replaces non-ASCII characters and spaces with _). TMInterface cannot load script files with special characters in their names.

python scripts/fix_replay_filenames.py --dry-run   # preview changes
python scripts/fix_replay_filenames.py              # apply

Step 4. Capture frames from replays (TMInterface; game must be running). Use the filtered track list.

python scripts/capture_replays_tmnf.py --replays-dir maps/replays --output-dir maps/img --workers 1 --width 256 --height 256 --running-speed 16 --fps 100 --track-ids maps/track_ids_no_respawn.txt --max-replays-per-track 1 --vcp-dir maps/vcp

Multi-worker capture: Running with --workers N (N > 1) is not yet working reliably (multiple game windows, key input and preview handling are not coordinated). Use ``–workers 1`` for now.

Map preview handling: Previews and “Press Enter to start” screens are handled automatically via disable_forced_camera + skip_map_load_screens. If the game still doesn’t start within 3 seconds (no RUN_STEP messages), the script sends TMInterface give_up / press delete commands to restart the race every 3 seconds (up to 25 seconds total), then skips the map. Use --write-enter-maps to collect track IDs of maps that didn’t start, then --exclude-enter-maps on the next run.

Stale / hang detection: Several levels detect when the capture gets stuck and force a switch to the next replay: (1) socket timeout during race (20s with no messages → unload); (2) wall-clock progress (no RUN_STEP for 25s even if CHECKPOINT/LAP arrive → unload); (3) total rollout limit (5 min); (4) on any exception, the script attempts unload before returning.

Note on –running-speed: Higher values may cause the game to skip inputs or desync (e.g. 8 can break replay, 4 works reliably). Use 4–6 for capture.

VCP (zone centers): To enable zone-based meta in manifests (for BC with float inputs), add --vcp-dir maps/vcp. VCP files are auto-generated from replays when missing; defaults: --vcp-distance 0.5, --vcp-suffix cl, --vcp-auto-generate. Use --no-vcp-auto-generate to disable auto-generation.

Output: maps/img/<track_id>/<replay_name>/ — frames (jpeg), metadata.json, manifest.json. Details below.

—

How it works (details)

The following sections describe the download modules and pipeline, the filter script, and frame capture.

Module layout (replays_tmnf)

api.py — TMNF-X API: track search, replay list, download replays/maps.
list_popular.py — list popular tracks; used by download when --list-popular.
download.py — entry point: CLI for --track-id / --track-name / --track-ids or --list-popular.
pipeline.py — pipeline for --track-ids: producer → replay queue → download workers → map workers; resume via .replay_progress.

Modes and options (download)

``–list-popular`` — fetch popular track list from TMNF-X, write to --output. Optionally --download-replays and/or --download-tracks.
``–track-ids <file>`` — run pipeline: replays to --replays-dir, maps via --extract-tracks-from-replays to --tracks-dir.
``–track-id`` / ``–track-name`` — single track; replays to --output-dir (default replays_tmnf).

Pipeline (–track-ids)

API workers (--api-workers) — request replay lists per track in parallel; often the bottleneck without many workers.
Download workers — save replays under replays-dir/track_id/....
Map workers — with --extract-tracks-from-replays, extract map to tracks-dir as {TrackId}.Challenge.Gbx.

Resume: replays-dir/.replay_progress stores the next track index. Ctrl+C stops and saves progress.

Filter tracks (step 3): filter_track_ids_no_respawn.py

scripts/filter_track_ids_no_respawn.py reads a track ID list (e.g. maps/track_ids.txt) and replays in --replays-dir (or -r), detects tracks where any replay respawns, and writes a new list (e.g. maps/track_ids_no_respawn.txt) containing only tracks with no respawn. Use this list in step 4 for more stable capture (no respawns during replay). Arguments: --input, --output, -r / --replays-dir, --workers.

Filter tracks (step 3a): filter_track_ids_custom_maptype.py

scripts/filter_track_ids_custom_maptype.py reads a track ID list and Challenge.Gbx files in --tracks-dir (or -t), detects tracks with non-standard environments (e.g., Stunts, custom MapType) or long MediaTracker previews (not yet implemented), and writes a new list (e.g. maps/track_ids_standard.txt) containing only standard tracks. Use this list in step 4 to avoid tracks with “Press Enter to start” screens or custom map scripts.

Currently checks:

Environment field: excludes maps with environment not in {Stadium, Speed, Alpine, Rally, Bay, Island, Coast, Desert}

Future checks (not yet implemented):

MediaTracker intro duration > threshold (--max-preview-duration N.0): requires pygbx MediaTracker parsing support

Arguments: --input (default maps/track_ids.txt), --output (default maps/track_ids_standard.txt), -t / --tracks-dir (default maps/tracks_tmnf), -j / --jobs (default cpu_count), --only-with-maps (skip tracks without Challenge.Gbx), --max-preview-duration (not yet implemented).

Example:

python scripts/filter_track_ids_custom_maptype.py --input maps/track_ids_no_respawn.txt --output maps/track_ids_standard.txt --tracks-dir maps/tracks --jobs 16

Main arguments (download)

Extracting map from replay

A TMNF replay GBX embeds the map. With --extract-tracks-from-replays, the map is extracted into --tracks-dir as {TrackId}.Challenge.Gbx using pygbx (project dependency).

Frame capture (capture_replays_tmnf.py)

scripts/capture_replays_tmnf.py runs replays from maps/replays (layout: replays_dir/track_id/*.replay.gbx) via TMInterface, captures screenshots at a given FPS and resolution, and saves frames with timing and metadata. Maps are expected in --tracks-dir (default maps/tracks): tracks-dir/track_id/*.Challenge.Gbx or tracks-dir/*.Challenge.Gbx — same layout as the download pipeline with --extract-tracks-from-replays.

Method: TMInterface native script loading. Each replay is converted to a TMInterface input script (.replay.gbx → .txt with press/steer commands); the script is loaded with load script.txt and the game replays inputs deterministically. Order is load script before map (per TMInterface docs). Finish is handled by time (switch to next replay/map shortly before nominal finish to avoid the medal screen and connection issues).

Output (per replay): output_dir/track_id/replay_name/

metadata.json — track_id, replay_name, challenge_name, fps, width, height, capture_interval_ms, step_ms, race_time_ms, total_frames.
manifest.json — per-frame entries: file, step, time_ms, inputs, state (position, speeds, rotation, cp_times_ms, current_checkpoint), capture_timestamp_utc.
frame_<step>_<time_ms>ms.jpeg — images (step = frame index, time_ms = simulation time).
frame_<step>_<time_ms>ms.json — optional (--per-frame-json).

Arguments: --replays-dir, --output-dir, --tracks-dir, --width, --height, --fps, --workers, --base-tmi-port, --track-ids, --track-id, --max-replays-per-track, --per-frame-json, --running-speed, --write-enter-maps, --exclude-enter-maps, --log-level, --config.

FPS and simulation time (time_ms in filenames): --fps is frames per simulation second (per second of race time). The interval between captures in sim time is 1000 / fps ms. So with --fps 64 you get ~15.6 ms between frames (e.g. frame_00000_0ms.jpeg, frame_00001_20ms.jpeg, …). --running-speed does not change this interval; it only affects how fast the race runs in real time.

Port: BASE_TMI_PORT in .env (or --base-tmi-port) must match the game’s TMInterface console (e.g. “Port set to 8480”). If the script hangs at “Waiting for game to finish loading…”, check the port.

Track vs replay loading: The script sends only the track filename (e.g. 1000074.Challenge.Gbx) via map ...; the game looks in its Tracks/Challenges folder. The replay is not passed by path: the script copies the replay into the game’s Autosaves folder (trackmania_base_path/Tracks/Replays/Autosaves as {Username}_{MapName}.Replay.gbx). Set trackmania_base_path in config or .env (e.g. TRACKMANIA_BASE_PATH=C:\Users\...\TrackMania). Default is Documents\TrackMania.

Multiple workers (``–workers N``): Support for N > 1 is not yet working. With several workers, multiple game instances run in parallel but key input (Enter for preview skip, Tilde for console) and window handling are not coordinated, so capture is unreliable. Use ``–workers 1`` until multi-worker operation is fixed. (When implemented, tasks would be grouped by track so each worker processes one track at a time.)

Connection handling: If TMInterface disconnects (e.g. game closed), the script clears the connection and the next replay will reconnect automatically.

Examples (from project root)

# Single worker, 64x64, 10 FPS
python scripts/capture_replays_tmnf.py --replays-dir maps/replays --output-dir maps/img

# 256x256, 1 FPS, specific track list, one replay per track
python scripts/capture_replays_tmnf.py --replays-dir maps/replays --output-dir maps/img --workers 1 --width 256 --height 256 --running-speed 10 --fps 1 --track-ids maps/track_ids_no_respawn.txt --max-replays-per-track 1

# Validate single replay
python scripts/capture_replays_tmnf_validate.py --replay-path maps/replays/924307/pos1_ben3847_89250ms.replay.gbx --output-dir maps/img_validate --fps 10 --step-ms 10

# Per-frame JSON
python scripts/capture_replays_tmnf.py --replays-dir maps/replays --output-dir out --per-frame-json

# Single track
python scripts/capture_replays_tmnf.py --track-id 12345 --replays-dir maps/replays --output-dir out

Level 0 visual pretraining on captured frames

After capturing frames to maps/img/, run the Level 0 pretraining pipeline:

# Step 1: pretrain AE (all defaults from config_files/pretrain/vis/pretrain_config.yaml)
#         creates output/ptretrain/vis/run_001/
python scripts/pretrain_visual_backbone.py --data-dir maps/img

# Step 1 alt: SimCLR with track-level val split
python scripts/pretrain_visual_backbone.py \
    --data-dir maps/img --task simclr --val-fraction 0.1

# Step 2: inject encoder into IQN (writes save/weights1.torch + save/weights2.torch)
#         encoder.pt = extracted CNN weights only (≠ .ckpt Lightning checkpoint)
python scripts/init_iqn_from_encoder.py \
    --encoder-pt output/ptretrain/vis/run_001/encoder.pt \
    --save-dir   save/

# Step 3: start RL training (learner auto-loads the checkpoint)
python scripts/train.py

The standard pipeline uses PyTorch Lightning (framework: lightning in config_files/pretrain/vis/pretrain_config.yaml). Each run creates a versioned subdirectory inside output_dir:

run_001/encoder.pt — CNN weights; what init_iqn_from_encoder.py needs
run_001/pretrain_meta.json — full reproducibility record
run_001/metrics.csv — per-epoch loss history
run_001/checkpoints/ — .ckpt snapshots for resuming training (not for IQN)
run_001/tensorboard/, run_001/csv/ — training logs

Key dataset properties:

ReplayFrameDataset groups frames by replay directory; temporal stacks (--n-stack) never cross replay/track boundaries.
--val-fraction 0.1 splits at the track level to prevent leakage.
Expected directory layout: maps/img/<track_id>/<replay_name>/frame_*_*ms.jpeg

See docs/source/experiments/pretrain_replay_roadmap.rst for the full experiment matrix and KPI tracking guide.

API (TMNF-X / ManiaExchange)

Track search / replay list: TMNF-X API (e.g. https://tmnf.exchange/api/...).
Replay download: /recordgbx/{ReplayId}; maps: /trackgbx/{TrackId}.