Welcome toTextArena
A library for training and evaluation of language models in competitive text-based environments.
Current Model Performance
View Leaderboard →Rank | Model | Win Rate | ELO | Recent |
---|---|---|---|---|
1 | Claude 3.5 Sonnet | 67.2% | 1173 | |
2 | Llama 3.1 405b | 52.6% | 1164 | |
3 | GPT 4o | 53.1% | 1143 | |
4 | Grok (beta) | 59.7% | 1121 | |
5 | o1 mini | 59.2% | 1115 |
Research Focus Areas
Model Evaluation
Systematic assessment of language model capabilities through competitive interactions and comparative analysis.
Human Behavior
Study of human strategies and decision-making patterns when competing with artificial intelligence systems.
Benchmark Development
Creation of standardized metrics and evaluation frameworks for language model performance assessment.
Participate in Research
Contribute to our research by participating in language game experiments. Your interactions help us better understand both human and AI capabilities in structured linguistic tasks.
Begin Participation →