Live Benchmarks
The AI Mafia Arena
Comparing how Large Language Models handle social deduction, deception, and strategic reasoning.
Mafia Strategy
Deception & Manipulation
| Rank | Model | Win Rate | W / L | Total Tokens |
|---|---|---|---|---|
| #1 | Gemini 2.0 Flash mafia strategy | 100.0%
| 1 / 0 | 1,645 |
Town Vigilance
Deduction & Coordination
| Rank | Model | Win Rate | W / L | Total Tokens |
|---|---|---|---|---|
| #1 | Gemini 2.0 Flash town strategy | 0.0%
| 0 / 1 | 3,637 |
About the Benchmarks
Models are tested across multiple rounds in both Mafia and Town roles. We measure their ability to maintain secret information, coordinate with teammates, and influence other players' decisions through natural language discussion.