Specimen Report
· Elixir
llm-sim-games
phiat/llm-sim-games
LLM-powered bots play poker against each other - live table viewer, step-through debugging, full game persistence
- Stars
- ★ 1
- Forks
- ⑂ 0
- Language
- Elixir
- Size
- 1,845 kB
- Last Push
- 2mo ago
- Forged
- 5mo ago
ai-agentsgamesllmpokersimulation
# AgGames
Multi-agent simulation platform where LLM-powered bots play games in strategic environments.
## Overview
AgGames is an Elixir/Phoenix application that orchestrates simulations with AI-powered bots. Each bot is a GenServer process with its own personality and decision-making capabilities via LLMs. Bots compete in poker games with full persistence and analytics.
**Key Features:**
- **LLM-Powered Bots** - Each bot uses Groq with structured outputs for strategic decisions
- **Poker Variants** - 5-Card Draw and Texas Hold'em fully implemented
- **Live Poker Viewer** - Real-time visual poker table with step-through debugging
- **Responsive Design** - Mobile-friendly table view with adaptive layouts
- **DuckDB Persistence** - Full game history stored for replay and analytics
- **Cost Tracking** - Token usage and costs calculated per LLM call
- **Accessibility** - Skip links, ARIA labels, keyboard navigation support
## Tech Stack
- **Elixir/Phoenix** - Core application framework
- **GenServer** - One process per bot, one per simulation
- **DuckDB** - Game history persistence and analytics
- **Groq API** - LLM provider with structured outputs
- **LiveView** - Real-time web UI
- **Credo** - Static code analysis
## Quick Start
### Prerequisites
```bash
# Install Elixir 1.19+
# Set up environment
cp .env.example .env
# Add your GROQ_API_KEY to .env
```
### Installation
```bash
# Install dependencies
mix deps.get
# Set up database
mix ecto.setup
# Start Phoenix server
mix phx.server
```
Visit `http://localhost:4000` to see the dashboard.
### Run a Poker Hand
```bash
# Source environment and run
source .env && mix run scripts/run_full_hand.exs
```
Example output:
```
Hand #1 dealt
PRE-DRAW BETTING
Diana: FOLD
Eve: RAISE $20
Frank: CALL $20
Alice: RAISE $100
Bob: FOLD
Charlie: CALL $90
DRAW PHASE
Alice: draws 1
Charlie: draws 2
POST-DRAW BETTING
Alice: BET $200
Charlie: FOLD
SHOWDOWN
Winner: Alice wins $235
METRICS: 12 calls | 5387→9564 tokens | 11030ms | $0.003273
```
## Project Structure
```
lib/ag_games/
bots/ # Bot GenServers and prompts
bot.ex # Main Bot GenServer with LLM integration
supervisor.ex # DynamicSupervisor for bot lifecycle
prompts/poker.ex # Poker prompts with JSON schemas
# Note: Uses 'hole_cards' for player's private cards
simulations/ # Simulation orchestration
simulation.ex # Simulation GenServer (orchestrates rounds)
logger.ex # File logging for game sessions
games/ # Game implementations
poker/
deck.ex # Card deck (52 cards, Unicode suits)
hand_evaluator.ex # Hand ranking (Royal Flush → High Card)
# Note: Supports 7-card evaluation (C(7,5) combinations)
betting.ex # Action validation and application
pots.ex # Side pot calculation
draw.ex # Draw phase handling (5-Card Draw only)
holdem.ex # Texas Hold'em variant implementation
# Note: 4 betting rounds (preflop, flop, turn, river)
poker.ex # Core 5-Card Draw logic
# Note: Player struct uses 'hole_cards' field (not 'hand')
poker_session.ex # Orchestrates hands with logging/persistence
# Note: Broadcasts events to LiveView via PubSub
persistence/ # Data layer
poker_db.ex # DuckDB schema and queries
poker_db_server.ex # Connection GenServer
# Note: Schema uses 'hole_cards' column
llm/ # LLM abstraction
groq.ex # Groq API with structured outputs
pricing.ex # Token cost calculation
provider.ex # HTTP utilities
lib/ag_games_web/
live/poker/
setup_live.ex # Game configuration UI
table_live.ex # Real-time poker table view
components/poker/
table_components.ex # Poker table visual components
test/ # 238 tests
ag_games/
games/poker/
hand_evaluator_test.exs # Hand evaluation (22 tests)
side_pot_test.exs # Side pot scenarios
deck_exhaustion_test.exs # Edge cases
persistence/
poker_db_test.exs # DuckDB persistence (7 tests)
integration/
bot_llm_integration_test.exs
multi_hand_test.exs
```
**Terminology Notes:**
- `hole_cards` - Player's private cards (standardized across codebase, DB, prompts)
- `hand_number` - Round number (e.g., "Hand #5")
- `hand_rank` - Poker hand type (e.g., "Full House")
- `HandEvaluator` - Module for evaluating poker hands
## Features
### Structured LLM Outputs
Bots respond with JSON matching defined schemas:
```elixir
# Betting schema
%{
type: "object",
properties: %{
action: %{type: "string", enum: ["fold", "check", "call", "bet", "raise"]},
amount: %{type: ["integer", "null"]},
reasoning: %{type: "string"}
},
required: ["action", "amount", "reasoning"]
}
```
With automatic retries on schema validation failures.
### DuckDB Analytics
Query game history for analysis:
```elixir
# Player statistics
PokerDBServer.get_player_stats()
# LLM metrics (tokens, cost, latency)
PokerDBServer.get_llm_metrics()
# Recent hands
PokerDBServer.get_recent_hands(10)
```
### Cost Tracking
Token costs calculated per request using configurable pricing:
```json
// priv/pricing.json
{
"models": {
"openai/gpt-oss-20b": {
"input_cost_per_million": 0.075,
"output_cost_per_million": 0.3
}
}
}
```
## Development
### Running Tests
```bash
mix test # Run all 80+ tests
mix test --cover # With coverage
mix credo --strict # Static analysis
```
### Issue Tracking
Uses [beads](https://github.com/steveyegge/beads) for issue tracking:
```bash
bd list # List all issues
bd ready # Show unblocked work
bd create --title="..." --type=task --priority=2
bd close <id> # Close issue
```
## Configuration
### Environment Variables
```bash
GROQ_API_KEY=your_key_here # Required
```
### DuckDB Path
```elixir
# config/config.exs
config :ag_games, AgGames.Persistence.PokerDB,
db_path: "data/poker.duckdb"
```
## Livebook Integration
Interactive notebooks for exploring poker data and testing LLM prompts.
### Setup
1. Install Livebook: `mix escript.install hex livebook`
2. Start Livebook: `livebook server`
3. Open any notebook from the `livebook/` directory
### Available Notebooks
- **hand_analysis.livemd** - Visualize hand distributions, player stats, win rates
- **llm_testing.livemd** - Test betting/draw prompts interactively, compare bot personalities
- **cost_analytics.livemd** - Track LLM costs, token usage, latency metrics
## Roadmap
- [x] 5-Card Draw Poker with full game logic
- [x] DuckDB persistence and analytics
- [x] Structured LLM outputs with retries
- [x] Cost tracking
- [x] Live Poker Viewer with step-through debugging
- [x] Accessibility improvements (ARIA, keyboard nav)
- [x] Livebook integration for interactive analysis
- [x] Texas Hold'em variant
- [x] Multi-provider LLM support (Groq, Gemini, Ollama)
- [ ] Betting structures: limit, pot-limit
## License
MIT