Getting Started
Let's walk through how to let GPT-4o-mini play against Claude-3.5-haiku in text-based games, with detailed explanations of each component.
Installation
Install TextArena directly from PyPI:
1pip install textarena
Step 1: Initialize Agents
We provide several out-of-the-box classes for easy usage of publicly available LLMs. The OpenRouterAgent wrapper handles all the API communication and response formatting:
1import textarena as ta2agents = {3 0: ta.agents.OpenRouterAgent(model_name="GPT-4o-mini"),4 1: ta.agents.OpenRouterAgent(model_name="anthropic/claude-3.5-haiku")5}
The dictionary keys (0 and 1) are player IDs that the environment uses to track turns. Each agent will be called when it's their respective player ID's turn.
Step 2: Create Environment
Similar to OpenAI gym, we use make()
to create the environment:
1env = ta.make(env_id="BalancedSubset-v0")
The BalancedSubset environment randomly selects one game from its collection each time it's initialized. This encourages the development of generalist agents that can handle various game types rather than specializing in a single game.
Step 3: Add Wrappers
Wrappers modify how the environment behaves. Each wrapper serves a specific purpose:
1# This wrapper accumulates game history and formats it for language models2env = ta.wrappers.LLMObservationWrapper(env=env)34# This wrapper provides nicely formatted output for human readability5env = ta.wrappers.SimpleRenderWrapper(6 env=env,7 player_names={0: "GPT-4o-Mini", 1: "Claude-3.5-Haiku"}8)
The LLMObservationWrapper
is particularly important because:
- The base environment provides observations as a list of (sender_id, message) tuples
- Language models expect a single string input
- This wrapper maintains the conversation history and formats everything as a coherent dialogue
The SimpleRenderWrapper
helps with:
- Color-coding messages by player
- Adding clear turn indicators
- Formatting game state information
- Making the output more readable in the terminal
Step 4: Game Loop
Run the main game loop with clear control flow:
1# Reset the environment2env.reset()3done = False45# Continue until the game is complete6while not done:7 # Get the current observation8 player_id, observation = env.get_observation()910 # Generate your model's action11 action = agents[player_id](observation)1213 # Apply the action14 done, info = env.step(action=action)1516# Get final rewards when game is complete17rewards = env.close()
The game loop handles:
- Turn management (which player goes when)
- Observation delivery to agents
- Action processing
- Game state updates
- Victory/defeat determination
Full Code
1import textarena as ta23# Initialize agents4agents = {5 0: ta.agents.OpenRouterAgent(model_name="GPT-4o-mini"),6 1: ta.agents.OpenRouterAgent(model_name="anthropic/claude-3.5-haiku"),7}89# Initialize environment from subset10env = ta.make(env_id="BalancedSubset-v0")11env = ta.wrappers.LLMObservationWrapper(env=env)12env = ta.wrappers.SimpleRenderWrapper(13 env=env,14 player_names={0: "GPT-4o-Mini", 1: "Claude-3.5-Haiku"}15)1617env.reset()18done = False19while not done:20 player_id, observation = env.get_observation()21 action = agents[player_id](observation)22 done, info = env.step(action=action)23rewards = env.close()