🔥 News
🔥 News
A fully autonomous framework that evolves high-performing agents without external data through multi-step co-evolution and seamless tool integration. Establishes a symbiotic competition between curriculum agent and executor agent.
Figure: The Agent0 co-evolution framework showing the symbiotic competition between curriculum and executor agents.
Agent0 is a fully autonomous framework designed to guide the evolution of agents entirely from scratch, completely eliminating the dependence on any external data or human annotations. It pioneeringly combines tool integration with multi-round co-evolution.
Figure: The Agent0 co-evolution framework showing the symbiotic competition between curriculum and executor agents.
Curriculum Agent: Trained using RL to propose frontier tasks that precisely challenge the executor's current capabilities, using the executor's uncertainty and tool-use frequency as reward signals.
Executor Agent: Trained via RL to successfully solve these tasks, optimizing on a filtered set of challenging problems and using pseudo-labels derived from its own majority voting.
Empirical results show that Agent0 achieves substantial model-agnostic capability gains on Qwen3-8B-Base:
| Model | AVG | AMC | Minerva | MATH | GSM8K | Olympiad | AIME25 | AIME24 |
|---|---|---|---|---|---|---|---|---|
| Qwen3-8B-Base | ||||||||
| Base Model | 49.2 | 52.0 | 50.0 | 78.0 | 89.1 | 44.7 | 16.7 | 13.9 |
| Base Model w/ tool | 53.2 | 60.3 | 54.9 | 79.2 | 90.7 | 47.9 | 18.7 | 20.9 |
| + Absolute Zero | 52.6 | 62.5 | 52.9 | 76.6 | 92.0 | 47.8 | 18.2 | 18.4 |
| + R-Zero | 54.7 | 61.7 | 60.7 | 82.0 | 94.1 | 48.9 | 19.2 | 16.4 |
| + Socratic-Zero | 56.1 | 63.7 | 52.4 | 81.2 | 87.3 | 55.1 | 24.5 | 28.4 |
| + Agent0 (Ours) | 58.2 | 62.4 | 61.3 | 82.4 | 94.5 | 54.0 | 24.8 | 28.0 |
Table: Comprehensive results on mathematical reasoning benchmarks. Agent0 achieves the highest average score.
Agent0-VL extends the self-evolution paradigm to multimodal reasoning tasks by incorporating tool usage not only into reasoning but also into self-evaluation and self-repair. It achieves continual self-improvement without any human annotation or external reward models through a Self-Evolving Reasoning Cycle (SERC).
Figure: The Agent0-VL framework showing the dual-role architecture with Solver and Verifier.
Solver: Performs multi-turn tool-integrated reasoning, dynamically invoking external tools for grounded computation and visual perception.
Verifier: Generates structured feedback and fine-grained self-rewards through tool-grounded critique, enabling evidence-based self-evaluation and repair.
Agent0-VL demonstrates significant improvements on multimodal reasoning benchmarks:
| Model | MathVerse | MathVision | MathVista | WeMath | HallBench | ChartQA | MMMU | Avg. |
|---|---|---|---|---|---|---|---|---|
| Open-Source General MLLMs | ||||||||
| InternVL-2.5-8B | 39.5 | 19.7 | 64.4 | 53.5 | 61.7 | 79.1 | 62.7 | 54.4 |
| InternVL-3-8B | 39.8 | 29.3 | 71.6 | 58.1 | 64.3 | 85.9 | 60.7 | 58.5 |
| Qwen2.5-VL-7B | 46.3 | 25.1 | 67.8 | 62.1 | 65.0 | 83.5 | 58.6 | 58.3 |
| Qwen3-VL-8B | 62.1 | 53.9 | 77.2 | 72.5 | 72.1 | 84.6 | 69.6 | 70.3 |
| Open-Source Reasoning MLLMs | ||||||||
| Vision-R1-7B | 51.9 | 30.7 | 73.5 | 73.9 | 68.8 | 79.8 | 50.5 | 61.3 |
| OpenVLThinker-7B | 45.7 | 26.3 | 71.2 | 66.7 | 70.2 | 78.4 | - | - |
| MM-Eureka-Qwen-7B | 50.5 | 27.9 | 73.6 | 67.4 | 66.9 | 82.1 | 52.7 | 60.2 |
| ThinkLite-VL-7B | 52.1 | 32.9 | 75.1 | 69.3 | 70.9 | 84.8 | 55.5 | 62.9 |
| Agent0-VL-7B (Ours) | 53.1 | 37.3 | 75.6 | 71.7 | 72.9 | 87.3 | 61.1 | 65.6 |
| Agent0-VL-8B (Ours) | 65.5 | 56.2 | 83.7 | 79.6 | 74.3 | 89.7 | 73.4 | 74.6 |
Table: Comparison of model performance across visual reasoning benchmarks. Agent0-VL achieves state-of-the-art results among open-source models.
| Feature | Agent0 | Agent0-VL |
|---|---|---|
| Modality | Language | Vision + Language |
| Evolution Mechanism | Curriculum-Executor Co-Evolution | Solver-Verifier Self-Evolution |
| Tool Integration | ✓ In reasoning | ✓ In reasoning + evaluation + repair |
| External Data Required | Zero | Zero (for evolution) |
| Training Paradigm | Multi-round RL | Self-Evolving Reasoning Cycle (SERC) |
| Primary Task | Math & General Reasoning | Visual & Geometric Reasoning |
@article{xia2025agent0,
title={Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning},
author={Xia, Peng and Zeng, Kaide and Liu, Jiaqi and Qin, Can and Wu, Fang and Zhou, Yiyang and Xiong, Caiming and Yao, Huaxiu},
journal={arXiv preprint arXiv:2511.16043},
year={2025}
}
@article{liu2025agent0vl,
title={Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning},
author={Liu, Jiaqi and Xiong, Kaiwen and Xia, Peng and Zhou, Yiyang and Ji, Haonian and Feng, Lu and Han, Siwei and Ding, Mingyu and Yao, Huaxiu},
journal={arXiv preprint arXiv:2511.19900},
year={2025}
}