Generic, optimized MCTS for Odin — ~289,000 sims/s on 9x9 Go via incremental liberty tracking. AlphaZero-style PUCT, leaf-parallel batched + threaded playouts. Plug your game in via a small Game vtable; ships with 11 demo games (Go, Hex, Reversi, Quoridor, Amazons, Nine Men's Morris, …).
# mcts-odin
[](https://github.com/phiat/mcts-odin/actions/workflows/ci.yml)
[](LICENSE)
[](https://github.com/phiat/mcts-odin/releases)
[](https://odin-lang.org/)
A generic, optimized Monte Carlo Tree Search package for [Odin](https://odin-lang.org/). AlphaZero-style PUCT with Dirichlet root noise + FPU (First-Play Urgency), optional fast rollouts, leaf-parallel batched playouts with virtual loss, and PCR (progressive computation reduction).
Games plug in by implementing a small `Game` vtable; the core knows nothing about Go, chess, or any specific game. Ships with **tic-tac-toe**, **Connect Four**, **Reversi**, **Hex**, **Breakthrough**, **Gomoku**, **Dots and Boxes**, **Amazons**, **Quoridor**, **Nine Men's Morris**, and a **Go** (9×9 / 19×19) reference implementation.
*v0.7.0 · eleven demo games · 148 tests passing under Odin's memory tracker.*
## Quick start
> **New to MCTS or this project?** [`docs/GETTING_STARTED.md`](docs/GETTING_STARTED.md) is a reading-and-running path that goes from "what does this do" to "I can wire a new game in" in under an hour.
```odin
package main
import "mcts"
import ttt "games/tictactoe"
main :: proc() {
g := ttt.game() // mcts.Game vtable
state := ttt.new_state() // tree takes ownership
cfg := mcts.default_config()
tree: mcts.Tree
mcts.init(&tree, &g, state, cfg, seed = 42)
defer mcts.destroy(&tree)
mcts.run_simulations(&tree, 1000, my_evaluator, &g)
action := mcts.select_action(&tree, temperature = 0.0)
}
```
`my_evaluator` is your value/policy function — see [`examples/tictactoe_selfplay.odin`](examples/tictactoe_selfplay.odin) for a complete runnable example with a uniform evaluator, and [`examples/nn_evaluator_skeleton.odin`](examples/nn_evaluator_skeleton.odin) for the policy/value plumbing pattern a real NN-backed evaluator needs (sequential and batched).
**Evaluator must mask to legal moves.** The MCTS hot path does not re-check legality before calling `do_move` on the chosen slot — a nonzero prior for an illegal action will be silently selected and produce undefined behaviour (panic / no-op / corrupted state, depending on how the game implements `do_move`). NN-backed evaluators must mask their logits to legal moves before normalisation.
## Throughput
9×9 Go, 1600 sims/move × 32 moves, uniform-policy evaluator, single-thread, `-o:speed -no-bounds-check`:
```
mcts-odin (default): ~289,000 sims/s (~34x autogodin cpp, ~101x autogodin odin)
```
For reference, [autogodin](https://github.com/phiat/autogodin)'s comparable bench (same workload, evaluator marshalled through a Python callback) reports `cpp: 8,470` and `odin: 2,859` sims/s. The numbers aren't strictly comparable — mcts-odin runs its evaluator inline in Odin without FFI — but the cumulative gap reflects:
- **MCTS core:** in-place do/undo (no per-node clones), packed slot storage with SoA hot fields (`N`, `N_virt`, `Q`), linear-space priors (no `math.exp` in the PUCT loop), branchless argmax, per-tree scratch arena, subtree reuse, FPU producing a broader/shallower tree, inlined xoshiro256++ RNG.
- **Go board:** incremental per-block liberty tracking (union-find + journaled `do_move`/`undo_move`, compile-time-folded liberty bitset) replaces the per-call group flood-fill in `is_legal_flat` and `do_move` — the single largest lift, ~2.67× over the prior flood-fill baseline of ~108k sims/s. Plus clone-free PSK probes via incremental Zobrist, an open-addressing flat u64 hash set for PSK history, and `BOARD_SIZE_HINT`-friendly hot-path helpers.
## Architecture
```
┌──────────────────────── Tree ───────────────────────┐
│ │
│ working_state ───┐ │
│ (owned by tree) │ single state mutated in │
│ │ place via do_move / undo_move │
│ ▼ │
│ ┌─── nodes[] ────┐ │
│ │ Node 0 (root) │ Hot fields in │
│ │ ├─ actions[] │ parallel SoA on │
│ │ ├─ priors[] │ the Tree: │
│ │ └─ child[] │ node_N[] │
│ ├────────────────┤ node_N_virt[] │
│ │ Node 1, 2, … │ node_Q[] │
│ │ (packed slots) │ │
│ └────────────────┘ │
│ │
│ arena permanent: nodes, slot arrays │
│ scratch_arena per-run: descent paths, deltas │
│ │
└─────────────────────────────────────────────────────┘
```
Three deliberate choices drive the throughput:
- **No per-node state copies.** Nodes are pure tree bookkeeping; the tree threads `working_state` through `do_move` on the way down and `undo_move` on the way up. A Go-board clone is several times costlier than a do/undo pair, and a deep tree creates thousands of nodes.
- **Packed slot storage.** Per-node `actions[k] / priors[k] / child[k]` are tightly packed slices, sized at first expansion. Hot fields (`N`, `N_virt`, `Q`) live in parallel arrays on the `Tree`, indexed by node index — the PUCT inner loop reads ~12 bytes per child rather than chasing a full Node struct on every random-access probe.
- **Two arenas per tree.** A growing arena owns nodes and slot arrays for the lifetime of the tree; a separate scratch arena is `free_all`-reset at the top of every `run_simulations` call. The caller's `context.temp_allocator` is never touched.
## Sequential, batched, threaded
```odin
// Sequential — one evaluator call per leaf. Fine for CPU-side policies,
// uniform priors, or any fast in-process value function.
mcts.run_simulations(&tree, 1600, my_evaluator, &g)
// Batched — leaf-parallel with virtual loss; the evaluator gets a slice
// of cloned leaf states per call. Use when the evaluator is expensive
// (e.g. a GPU NN forward pass) and benefits from large batch sizes.
mcts.run_simulations_batched(&tree, 1600, batch_size = 16,
my_batched_evaluator, &g)
// OS-thread parallel — N worker threads each run descent / eval / backup
// concurrently. Atomics on N / N_virt / Q + a coarse expand mutex keep the
// shared tree consistent; virtual loss decouples the descents. The supplied
// evaluator is called concurrently from every worker, so its user_data must
// be thread-safe. Determinism is dropped — repeated runs with the same seed
// produce different node visit counts.
mcts.run_simulations_threaded(&tree, 1600, n_threads = 8, my_evaluator, &g)
```
Near-linear scaling on slow evaluators (50 µs/call benchmark, 9×9 Go):
`n=2: 1.93x`, `n=4: 3.81x`, `n=8: 7.15x`. For cheap evaluators (uniform-policy,
microseconds per call) the mutex and CAS-loop contention can erase the
speedup — use the sequential or batched paths there.
All three paths share the same tree, the same Game vtable, and the same readouts (`select_action`, visit counts, Q values, priors).
## The Game vtable
```odin
Game :: struct {
clone: proc(state: rawptr) -> rawptr,
free: proc(state: rawptr),
do_move: proc(state: rawptr, action: int) -> Move_Delta,
undo_move: proc(state: rawptr, delta: Move_Delta),
is_terminal: proc(state: rawptr) -> bool,
terminal_value: proc(state: rawptr) -> f32, // [0, 1] from side-to-move
legal_actions: proc(state: rawptr, out: ^[dynamic]int),
current_player: proc(state: rawptr) -> i32, // 0 or 1 for two-player games
max_actions: int, // upper bound on action-id
}
```
See [`docs/EMBEDDING.md`](docs/EMBEDDING.md) for the full contract, evaluator signatures (sequential + batched), subtree reuse (`mcts.reuse_root(action)`), tuning knobs, and memory model.
## Layout
```
mcts/ generic MCTS core (game-agnostic)
game.odin Game vtable + Move_Delta
mcts.odin Tree / Node / Config + init / destroy
playout.odin Evaluator type, sequential run_simulations + fast_rollout
batched.odin leaf-parallel run_simulations_batched (virtual loss)
threaded.odin OS-thread parallel run_simulations_threaded
readout.odin select_action + visit/Q/priors readouts
rng.odin xoshiro256++ + gamma sampler + categorical helper
debug.odin dump_tree_dot / dump_tree_json (experimental)
version.odin VERSION constant
games/
tictactoe/ 3×3 solved-game sanity demo
connect_four/ 7×6 column-drop demo
reversi/ 8×8 Reversi (Othello) with zero-alloc Move_Delta packing
hex/ 9×9 Hex — hexagonal-grid topology + BFS win detection
breakthrough/ 8×8 Breakthrough (Troyka 2000) — pawn movement + diagonal captures
gomoku/ 15×15 Free Gomoku — five-in-a-row, large branching factor
dots_and_boxes/ 4×4 dot grid — extra-turn on box close breaks to_play alternation
amazons/ 6×6 — two-stage moves (queen slide + arrow shot)
quoridor/ 5×5 — heterogeneous action space + per-candidate BFS validation
morris/ Nine Men's Morris — phase transitions (place → slide → fly) + sub-action within move (mill removal)
go/ 9×9 / 19×19 with Zobrist PSK, KataGo no-suicide, Tromp-Taylor scoring
tests/ 11 suites (./scripts/test.sh runs all, fails on leaks)
examples/
tictactoe_selfplay.odin full self-play loop
nn_evaluator_skeleton.odin sequential + batched NN evaluator template
bench/
bench.odin 9×9 Go throughput micro-bench vs autogodin baselines
threaded/ thread-scaling bench under a slow evaluator
profile/ wrapped-vtable timing profile (Game vtable + evaluator buckets)
scripts/ build / test helpers
docs/ GETTING_STARTED.md, EMBEDDING.md, UPSTREAM.md
```
## Build
```bash
./scripts/build.sh # build/libmcts_odin.so
./scripts/test.sh # all 11 suites, fails on leaks
odin run examples/tictactoe_selfplay.odin -file -o:speed
```
For per-suite test runs, individual benches, and the profile harness, see [`docs/GETTING_STARTED.md`](docs/GETTING_STARTED.md).
Optimization knobs:
```bash
ODIN_OPT="-o:speed -no-bounds-check" ./scripts/build.sh
```
## Why this exists
Most MCTS implementations are tied to a specific game (chess, Go, board engines). The handful of game-agnostic ones live in Python, JAX, or C++. As of mid-2026 nothing similar exists in the Odin ecosystem — and Odin's combination of manual memory control, slice-based hot paths, and inline ASM/SIMD friendliness makes it a natural fit for the inner loop of a search.
The core algorithm is a direct descendant of the MCTS in [ericjang/autogo](https://github.com/ericjang/autogo) (C++) via [autogodin](https://github.com/phiat/autogodin) (Odin port). This repo lifts the algorithm into a stand-alone, game-agnostic package.
## Contributing
Solo-maintained, pre-1.0. Bug reports and feature requests via [GitHub Issues](https://github.com/phiat/mcts-odin/issues) are welcome; please open an issue to discuss before sending a PR for anything larger than a typo so we don't end up duplicating work.
## License
MIT. See [LICENSE](LICENSE).