Specimen Report · Odin

mcts-odin

phiat/mcts-odin

Generic, optimized MCTS for Odin — ~289,000 sims/s on 9x9 Go via incremental liberty tracking. AlphaZero-style PUCT, leaf-parallel batched + threaded playouts. Plug your game in via a small Game vtable; ships with 11 demo games (Go, Hex, Reversi, Quoridor, Amazons, Nine Men's Morris, …).

Stars
★ 1
Forks
⑂ 0
Language
Odin
Size
333 kB
Last Push
6d ago
Forged
7d ago
alphazerogame-aimctsmonte-carlo-tree-searchodinodin-langboard-gamespuct
# mcts-odin [![CI](https://github.com/phiat/mcts-odin/actions/workflows/ci.yml/badge.svg)](https://github.com/phiat/mcts-odin/actions/workflows/ci.yml) [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE) [![Version](https://img.shields.io/github/v/tag/phiat/mcts-odin?label=version&color=f47b30)](https://github.com/phiat/mcts-odin/releases) [![Odin](https://img.shields.io/badge/Odin-dev--2026--05-orange.svg)](https://odin-lang.org/) A generic, optimized Monte Carlo Tree Search package for [Odin](https://odin-lang.org/). AlphaZero-style PUCT with Dirichlet root noise + FPU (First-Play Urgency), optional fast rollouts, leaf-parallel batched playouts with virtual loss, and PCR (progressive computation reduction). Games plug in by implementing a small `Game` vtable; the core knows nothing about Go, chess, or any specific game. Ships with **tic-tac-toe**, **Connect Four**, **Reversi**, **Hex**, **Breakthrough**, **Gomoku**, **Dots and Boxes**, **Amazons**, **Quoridor**, **Nine Men's Morris**, and a **Go** (9×9 / 19×19) reference implementation. *v0.7.0 · eleven demo games · 148 tests passing under Odin's memory tracker.* ## Quick start > **New to MCTS or this project?** [`docs/GETTING_STARTED.md`](docs/GETTING_STARTED.md) is a reading-and-running path that goes from "what does this do" to "I can wire a new game in" in under an hour. ```odin package main import "mcts" import ttt "games/tictactoe" main :: proc() { g := ttt.game() // mcts.Game vtable state := ttt.new_state() // tree takes ownership cfg := mcts.default_config() tree: mcts.Tree mcts.init(&tree, &g, state, cfg, seed = 42) defer mcts.destroy(&tree) mcts.run_simulations(&tree, 1000, my_evaluator, &g) action := mcts.select_action(&tree, temperature = 0.0) } ``` `my_evaluator` is your value/policy function — see [`examples/tictactoe_selfplay.odin`](examples/tictactoe_selfplay.odin) for a complete runnable example with a uniform evaluator, and [`examples/nn_evaluator_skeleton.odin`](examples/nn_evaluator_skeleton.odin) for the policy/value plumbing pattern a real NN-backed evaluator needs (sequential and batched). **Evaluator must mask to legal moves.** The MCTS hot path does not re-check legality before calling `do_move` on the chosen slot — a nonzero prior for an illegal action will be silently selected and produce undefined behaviour (panic / no-op / corrupted state, depending on how the game implements `do_move`). NN-backed evaluators must mask their logits to legal moves before normalisation. ## Throughput 9×9 Go, 1600 sims/move × 32 moves, uniform-policy evaluator, single-thread, `-o:speed -no-bounds-check`: ``` mcts-odin (default): ~289,000 sims/s (~34x autogodin cpp, ~101x autogodin odin) ``` For reference, [autogodin](https://github.com/phiat/autogodin)'s comparable bench (same workload, evaluator marshalled through a Python callback) reports `cpp: 8,470` and `odin: 2,859` sims/s. The numbers aren't strictly comparable — mcts-odin runs its evaluator inline in Odin without FFI — but the cumulative gap reflects: - **MCTS core:** in-place do/undo (no per-node clones), packed slot storage with SoA hot fields (`N`, `N_virt`, `Q`), linear-space priors (no `math.exp` in the PUCT loop), branchless argmax, per-tree scratch arena, subtree reuse, FPU producing a broader/shallower tree, inlined xoshiro256++ RNG. - **Go board:** incremental per-block liberty tracking (union-find + journaled `do_move`/`undo_move`, compile-time-folded liberty bitset) replaces the per-call group flood-fill in `is_legal_flat` and `do_move` — the single largest lift, ~2.67× over the prior flood-fill baseline of ~108k sims/s. Plus clone-free PSK probes via incremental Zobrist, an open-addressing flat u64 hash set for PSK history, and `BOARD_SIZE_HINT`-friendly hot-path helpers. ## Architecture ``` ┌──────────────────────── Tree ───────────────────────┐ │ │ │ working_state ───┐ │ │ (owned by tree) │ single state mutated in │ │ │ place via do_move / undo_move │ │ ▼ │ │ ┌─── nodes[] ────┐ │ │ │ Node 0 (root) │ Hot fields in │ │ │ ├─ actions[] │ parallel SoA on │ │ │ ├─ priors[] │ the Tree: │ │ │ └─ child[] │ node_N[] │ │ ├────────────────┤ node_N_virt[] │ │ │ Node 1, 2, … │ node_Q[] │ │ │ (packed slots) │ │ │ └────────────────┘ │ │ │ │ arena permanent: nodes, slot arrays │ │ scratch_arena per-run: descent paths, deltas │ │ │ └─────────────────────────────────────────────────────┘ ``` Three deliberate choices drive the throughput: - **No per-node state copies.** Nodes are pure tree bookkeeping; the tree threads `working_state` through `do_move` on the way down and `undo_move` on the way up. A Go-board clone is several times costlier than a do/undo pair, and a deep tree creates thousands of nodes. - **Packed slot storage.** Per-node `actions[k] / priors[k] / child[k]` are tightly packed slices, sized at first expansion. Hot fields (`N`, `N_virt`, `Q`) live in parallel arrays on the `Tree`, indexed by node index — the PUCT inner loop reads ~12 bytes per child rather than chasing a full Node struct on every random-access probe. - **Two arenas per tree.** A growing arena owns nodes and slot arrays for the lifetime of the tree; a separate scratch arena is `free_all`-reset at the top of every `run_simulations` call. The caller's `context.temp_allocator` is never touched. ## Sequential, batched, threaded ```odin // Sequential — one evaluator call per leaf. Fine for CPU-side policies, // uniform priors, or any fast in-process value function. mcts.run_simulations(&tree, 1600, my_evaluator, &g) // Batched — leaf-parallel with virtual loss; the evaluator gets a slice // of cloned leaf states per call. Use when the evaluator is expensive // (e.g. a GPU NN forward pass) and benefits from large batch sizes. mcts.run_simulations_batched(&tree, 1600, batch_size = 16, my_batched_evaluator, &g) // OS-thread parallel — N worker threads each run descent / eval / backup // concurrently. Atomics on N / N_virt / Q + a coarse expand mutex keep the // shared tree consistent; virtual loss decouples the descents. The supplied // evaluator is called concurrently from every worker, so its user_data must // be thread-safe. Determinism is dropped — repeated runs with the same seed // produce different node visit counts. mcts.run_simulations_threaded(&tree, 1600, n_threads = 8, my_evaluator, &g) ``` Near-linear scaling on slow evaluators (50 µs/call benchmark, 9×9 Go): `n=2: 1.93x`, `n=4: 3.81x`, `n=8: 7.15x`. For cheap evaluators (uniform-policy, microseconds per call) the mutex and CAS-loop contention can erase the speedup — use the sequential or batched paths there. All three paths share the same tree, the same Game vtable, and the same readouts (`select_action`, visit counts, Q values, priors). ## The Game vtable ```odin Game :: struct { clone: proc(state: rawptr) -> rawptr, free: proc(state: rawptr), do_move: proc(state: rawptr, action: int) -> Move_Delta, undo_move: proc(state: rawptr, delta: Move_Delta), is_terminal: proc(state: rawptr) -> bool, terminal_value: proc(state: rawptr) -> f32, // [0, 1] from side-to-move legal_actions: proc(state: rawptr, out: ^[dynamic]int), current_player: proc(state: rawptr) -> i32, // 0 or 1 for two-player games max_actions: int, // upper bound on action-id } ``` See [`docs/EMBEDDING.md`](docs/EMBEDDING.md) for the full contract, evaluator signatures (sequential + batched), subtree reuse (`mcts.reuse_root(action)`), tuning knobs, and memory model. ## Layout ``` mcts/ generic MCTS core (game-agnostic) game.odin Game vtable + Move_Delta mcts.odin Tree / Node / Config + init / destroy playout.odin Evaluator type, sequential run_simulations + fast_rollout batched.odin leaf-parallel run_simulations_batched (virtual loss) threaded.odin OS-thread parallel run_simulations_threaded readout.odin select_action + visit/Q/priors readouts rng.odin xoshiro256++ + gamma sampler + categorical helper debug.odin dump_tree_dot / dump_tree_json (experimental) version.odin VERSION constant games/ tictactoe/ 3×3 solved-game sanity demo connect_four/ 7×6 column-drop demo reversi/ 8×8 Reversi (Othello) with zero-alloc Move_Delta packing hex/ 9×9 Hex — hexagonal-grid topology + BFS win detection breakthrough/ 8×8 Breakthrough (Troyka 2000) — pawn movement + diagonal captures gomoku/ 15×15 Free Gomoku — five-in-a-row, large branching factor dots_and_boxes/ 4×4 dot grid — extra-turn on box close breaks to_play alternation amazons/ 6×6 — two-stage moves (queen slide + arrow shot) quoridor/ 5×5 — heterogeneous action space + per-candidate BFS validation morris/ Nine Men's Morris — phase transitions (place → slide → fly) + sub-action within move (mill removal) go/ 9×9 / 19×19 with Zobrist PSK, KataGo no-suicide, Tromp-Taylor scoring tests/ 11 suites (./scripts/test.sh runs all, fails on leaks) examples/ tictactoe_selfplay.odin full self-play loop nn_evaluator_skeleton.odin sequential + batched NN evaluator template bench/ bench.odin 9×9 Go throughput micro-bench vs autogodin baselines threaded/ thread-scaling bench under a slow evaluator profile/ wrapped-vtable timing profile (Game vtable + evaluator buckets) scripts/ build / test helpers docs/ GETTING_STARTED.md, EMBEDDING.md, UPSTREAM.md ``` ## Build ```bash ./scripts/build.sh # build/libmcts_odin.so ./scripts/test.sh # all 11 suites, fails on leaks odin run examples/tictactoe_selfplay.odin -file -o:speed ``` For per-suite test runs, individual benches, and the profile harness, see [`docs/GETTING_STARTED.md`](docs/GETTING_STARTED.md). Optimization knobs: ```bash ODIN_OPT="-o:speed -no-bounds-check" ./scripts/build.sh ``` ## Why this exists Most MCTS implementations are tied to a specific game (chess, Go, board engines). The handful of game-agnostic ones live in Python, JAX, or C++. As of mid-2026 nothing similar exists in the Odin ecosystem — and Odin's combination of manual memory control, slice-based hot paths, and inline ASM/SIMD friendliness makes it a natural fit for the inner loop of a search. The core algorithm is a direct descendant of the MCTS in [ericjang/autogo](https://github.com/ericjang/autogo) (C++) via [autogodin](https://github.com/phiat/autogodin) (Odin port). This repo lifts the algorithm into a stand-alone, game-agnostic package. ## Contributing Solo-maintained, pre-1.0. Bug reports and feature requests via [GitHub Issues](https://github.com/phiat/mcts-odin/issues) are welcome; please open an issue to discuss before sending a PR for anything larger than a typo so we don't end up duplicating work. ## License MIT. See [LICENSE](LICENSE).
↗ GitHub Homepage