SETA: Scaling Environments for Terminal Agents
— Designing resilient toolkits and scalable RL environments for CAMEL terminal agents
Authors: Qijia Shen 1,2, Jay Rainton 3, Aznaur Aliev 4, Ahmed Awelkair 1,2,4,5, Boyuan Ma 3, Zhiqi (Julie) Huang 6,7, Yuzhen Mao 8, Wendong Fan 1,2, Philip Torr 9, Bernard Ghanem 4, Changran Hu 3, Urmish Thakker 3, ****Guohao Li 1,2
- CAMEL-AI.org
- Eigent.AI
- SambaNova
- KAUST
- University of Malaya
- Imperial College London
- University College London
- Stanford University
- University of Oxford
TL;DR: In SETA, we start with building robust toolkits to empower the agent’s planning and terminal capability, and achieved SOTA performance among the same model families. We performed extensive analysis on the results, and built up the scalable environment synthesis pipeline for RL training to further improve the models’ terminal capability.
Key contributions:
- SOTA Terminal Agent on Terminal-Bench
- We achieved SOTA performance with a Claude Sonnet 4.5 based agent on Terminal-Bench 2.0 & a GPT-4.1 based Agent on Terminal-Bench 1.0 (SOTA compared against agents with the same model)
- Scalable RL training with synthetic terminal environments
- We release an initial synthetic dataset containing 400 terminal tasks with continuous scaling, 260 of which have been used for RLVR finetuning for a Qwen3-8B model.
- A clean agent design generalizes across training and evaluation frameworks