Specimen Report · Elixir

llm-sim-games

phiat/llm-sim-games

LLM-powered bots play poker against each other - live table viewer, step-through debugging, full game persistence

Stars
★ 1
Forks
⑂ 0
Language
Elixir
Size
1,845 kB
Last Push
2mo ago
Forged
5mo ago
ai-agentsgamesllmpokersimulation
# AgGames Multi-agent simulation platform where LLM-powered bots play games in strategic environments. ## Overview AgGames is an Elixir/Phoenix application that orchestrates simulations with AI-powered bots. Each bot is a GenServer process with its own personality and decision-making capabilities via LLMs. Bots compete in poker games with full persistence and analytics. **Key Features:** - **LLM-Powered Bots** - Each bot uses Groq with structured outputs for strategic decisions - **Poker Variants** - 5-Card Draw and Texas Hold'em fully implemented - **Live Poker Viewer** - Real-time visual poker table with step-through debugging - **Responsive Design** - Mobile-friendly table view with adaptive layouts - **DuckDB Persistence** - Full game history stored for replay and analytics - **Cost Tracking** - Token usage and costs calculated per LLM call - **Accessibility** - Skip links, ARIA labels, keyboard navigation support ## Tech Stack - **Elixir/Phoenix** - Core application framework - **GenServer** - One process per bot, one per simulation - **DuckDB** - Game history persistence and analytics - **Groq API** - LLM provider with structured outputs - **LiveView** - Real-time web UI - **Credo** - Static code analysis ## Quick Start ### Prerequisites ```bash # Install Elixir 1.19+ # Set up environment cp .env.example .env # Add your GROQ_API_KEY to .env ``` ### Installation ```bash # Install dependencies mix deps.get # Set up database mix ecto.setup # Start Phoenix server mix phx.server ``` Visit `http://localhost:4000` to see the dashboard. ### Run a Poker Hand ```bash # Source environment and run source .env && mix run scripts/run_full_hand.exs ``` Example output: ``` Hand #1 dealt PRE-DRAW BETTING Diana: FOLD Eve: RAISE $20 Frank: CALL $20 Alice: RAISE $100 Bob: FOLD Charlie: CALL $90 DRAW PHASE Alice: draws 1 Charlie: draws 2 POST-DRAW BETTING Alice: BET $200 Charlie: FOLD SHOWDOWN Winner: Alice wins $235 METRICS: 12 calls | 5387→9564 tokens | 11030ms | $0.003273 ``` ## Project Structure ``` lib/ag_games/ bots/ # Bot GenServers and prompts bot.ex # Main Bot GenServer with LLM integration supervisor.ex # DynamicSupervisor for bot lifecycle prompts/poker.ex # Poker prompts with JSON schemas # Note: Uses 'hole_cards' for player's private cards simulations/ # Simulation orchestration simulation.ex # Simulation GenServer (orchestrates rounds) logger.ex # File logging for game sessions games/ # Game implementations poker/ deck.ex # Card deck (52 cards, Unicode suits) hand_evaluator.ex # Hand ranking (Royal Flush → High Card) # Note: Supports 7-card evaluation (C(7,5) combinations) betting.ex # Action validation and application pots.ex # Side pot calculation draw.ex # Draw phase handling (5-Card Draw only) holdem.ex # Texas Hold'em variant implementation # Note: 4 betting rounds (preflop, flop, turn, river) poker.ex # Core 5-Card Draw logic # Note: Player struct uses 'hole_cards' field (not 'hand') poker_session.ex # Orchestrates hands with logging/persistence # Note: Broadcasts events to LiveView via PubSub persistence/ # Data layer poker_db.ex # DuckDB schema and queries poker_db_server.ex # Connection GenServer # Note: Schema uses 'hole_cards' column llm/ # LLM abstraction groq.ex # Groq API with structured outputs pricing.ex # Token cost calculation provider.ex # HTTP utilities lib/ag_games_web/ live/poker/ setup_live.ex # Game configuration UI table_live.ex # Real-time poker table view components/poker/ table_components.ex # Poker table visual components test/ # 238 tests ag_games/ games/poker/ hand_evaluator_test.exs # Hand evaluation (22 tests) side_pot_test.exs # Side pot scenarios deck_exhaustion_test.exs # Edge cases persistence/ poker_db_test.exs # DuckDB persistence (7 tests) integration/ bot_llm_integration_test.exs multi_hand_test.exs ``` **Terminology Notes:** - `hole_cards` - Player's private cards (standardized across codebase, DB, prompts) - `hand_number` - Round number (e.g., "Hand #5") - `hand_rank` - Poker hand type (e.g., "Full House") - `HandEvaluator` - Module for evaluating poker hands ## Features ### Structured LLM Outputs Bots respond with JSON matching defined schemas: ```elixir # Betting schema %{ type: "object", properties: %{ action: %{type: "string", enum: ["fold", "check", "call", "bet", "raise"]}, amount: %{type: ["integer", "null"]}, reasoning: %{type: "string"} }, required: ["action", "amount", "reasoning"] } ``` With automatic retries on schema validation failures. ### DuckDB Analytics Query game history for analysis: ```elixir # Player statistics PokerDBServer.get_player_stats() # LLM metrics (tokens, cost, latency) PokerDBServer.get_llm_metrics() # Recent hands PokerDBServer.get_recent_hands(10) ``` ### Cost Tracking Token costs calculated per request using configurable pricing: ```json // priv/pricing.json { "models": { "openai/gpt-oss-20b": { "input_cost_per_million": 0.075, "output_cost_per_million": 0.3 } } } ``` ## Development ### Running Tests ```bash mix test # Run all 80+ tests mix test --cover # With coverage mix credo --strict # Static analysis ``` ### Issue Tracking Uses [beads](https://github.com/steveyegge/beads) for issue tracking: ```bash bd list # List all issues bd ready # Show unblocked work bd create --title="..." --type=task --priority=2 bd close <id> # Close issue ``` ## Configuration ### Environment Variables ```bash GROQ_API_KEY=your_key_here # Required ``` ### DuckDB Path ```elixir # config/config.exs config :ag_games, AgGames.Persistence.PokerDB, db_path: "data/poker.duckdb" ``` ## Livebook Integration Interactive notebooks for exploring poker data and testing LLM prompts. ### Setup 1. Install Livebook: `mix escript.install hex livebook` 2. Start Livebook: `livebook server` 3. Open any notebook from the `livebook/` directory ### Available Notebooks - **hand_analysis.livemd** - Visualize hand distributions, player stats, win rates - **llm_testing.livemd** - Test betting/draw prompts interactively, compare bot personalities - **cost_analytics.livemd** - Track LLM costs, token usage, latency metrics ## Roadmap - [x] 5-Card Draw Poker with full game logic - [x] DuckDB persistence and analytics - [x] Structured LLM outputs with retries - [x] Cost tracking - [x] Live Poker Viewer with step-through debugging - [x] Accessibility improvements (ARIA, keyboard nav) - [x] Livebook integration for interactive analysis - [x] Texas Hold'em variant - [x] Multi-provider LLM support (Groq, Gemini, Ollama) - [ ] Betting structures: limit, pot-limit ## License MIT
↗ GitHub