Agent0 Series: Self-Evolving Agents from Zero Data

Unleashing Autonomous Agent Evolution via Tool-Integrated Reasoning

UNC-Chapel Hill · Salesforce Research · Stanford University

🔥 News

[2025/11/25] Agent0-VL paper was released on arXiv!
[2025/11/20] Agent0 paper was released on arXiv!
Agent0 Logo

Our Papers

Agent0

Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning

Authors: Peng Xia1, Kaide Zeng1, Jiaqi Liu1, Can Qin2, Fang Wu3, Yiyang Zhou1, Caiming Xiong2, Huaxiu Yao1
1UNC-Chapel Hill, 2Salesforce Research, 3Stanford University

A fully autonomous framework that evolves high-performing agents without external data through multi-step co-evolution and seamless tool integration. Establishes a symbiotic competition between curriculum agent and executor agent.

+18% Math Reasoning
+24% General Reasoning

Figure: The Agent0 co-evolution framework showing the symbiotic competition between curriculum and executor agents.

Agent0

Overview

Agent0 is a fully autonomous framework designed to guide the evolution of agents entirely from scratch, completely eliminating the dependence on any external data or human annotations. It pioneeringly combines tool integration with multi-round co-evolution.

Figure: The Agent0 co-evolution framework showing the symbiotic competition between curriculum and executor agents.

Key Innovation: Symbiotic Co-Evolution

Curriculum Agent: Trained using RL to propose frontier tasks that precisely challenge the executor's current capabilities, using the executor's uncertainty and tool-use frequency as reward signals.

Executor Agent: Trained via RL to successfully solve these tasks, optimizing on a filtered set of challenging problems and using pseudo-labels derived from its own majority voting.

Results

Empirical results show that Agent0 achieves substantial model-agnostic capability gains on Qwen3-8B-Base:

  • 18% improvement on mathematical reasoning benchmarks
  • 24% improvement on general reasoning benchmarks
  • Supports multi-turn interactions for context-rich, conversational tasks

Mathematical Reasoning Benchmarks

Model AVG AMC Minerva MATH GSM8K Olympiad AIME25 AIME24
Qwen3-8B-Base
Base Model 49.2 52.0 50.0 78.0 89.1 44.7 16.7 13.9
Base Model w/ tool 53.2 60.3 54.9 79.2 90.7 47.9 18.7 20.9
+ Absolute Zero 52.6 62.5 52.9 76.6 92.0 47.8 18.2 18.4
+ R-Zero 54.7 61.7 60.7 82.0 94.1 48.9 19.2 16.4
+ Socratic-Zero 56.1 63.7 52.4 81.2 87.3 55.1 24.5 28.4
+ Agent0 (Ours) 58.2 62.4 61.3 82.4 94.5 54.0 24.8 28.0

Table: Comprehensive results on mathematical reasoning benchmarks. Agent0 achieves the highest average score.

Agent0-VL

Overview

Agent0-VL extends the self-evolution paradigm to multimodal reasoning tasks by incorporating tool usage not only into reasoning but also into self-evaluation and self-repair. It achieves continual self-improvement without any human annotation or external reward models through a Self-Evolving Reasoning Cycle (SERC).

Figure: The Agent0-VL framework showing the dual-role architecture with Solver and Verifier.

Key Innovation: Dual-Role Self-Evolution

Solver: Performs multi-turn tool-integrated reasoning, dynamically invoking external tools for grounded computation and visual perception.

Verifier: Generates structured feedback and fine-grained self-rewards through tool-grounded critique, enabling evidence-based self-evaluation and repair.

Results

Agent0-VL demonstrates significant improvements on multimodal reasoning benchmarks:

  • 12.5% average improvement over Qwen-VL base model on geometric reasoning and visual scientific analysis
  • 7.3% improvement in test-time scaling performance when used as a process reward model
  • Consistent performance gains across multiple iterations of self-evolution
  • State-of-the-art results among open-source vision-language models

Visual Reasoning Benchmarks

Model MathVerse MathVision MathVista WeMath HallBench ChartQA MMMU Avg.
Open-Source General MLLMs
InternVL-2.5-8B 39.5 19.7 64.4 53.5 61.7 79.1 62.7 54.4
InternVL-3-8B 39.8 29.3 71.6 58.1 64.3 85.9 60.7 58.5
Qwen2.5-VL-7B 46.3 25.1 67.8 62.1 65.0 83.5 58.6 58.3
Qwen3-VL-8B 62.1 53.9 77.2 72.5 72.1 84.6 69.6 70.3
Open-Source Reasoning MLLMs
Vision-R1-7B 51.9 30.7 73.5 73.9 68.8 79.8 50.5 61.3
OpenVLThinker-7B 45.7 26.3 71.2 66.7 70.2 78.4 - -
MM-Eureka-Qwen-7B 50.5 27.9 73.6 67.4 66.9 82.1 52.7 60.2
ThinkLite-VL-7B 52.1 32.9 75.1 69.3 70.9 84.8 55.5 62.9
Agent0-VL-7B (Ours) 53.1 37.3 75.6 71.7 72.9 87.3 61.1 65.6
Agent0-VL-8B (Ours) 65.5 56.2 83.7 79.6 74.3 89.7 73.4 74.6

Table: Comparison of model performance across visual reasoning benchmarks. Agent0-VL achieves state-of-the-art results among open-source models.

Framework Comparison

Feature Agent0 Agent0-VL
Modality Language Vision + Language
Evolution Mechanism Curriculum-Executor Co-Evolution Solver-Verifier Self-Evolution
Tool Integration ✓ In reasoning ✓ In reasoning + evaluation + repair
External Data Required Zero Zero (for evolution)
Training Paradigm Multi-round RL Self-Evolving Reasoning Cycle (SERC)
Primary Task Math & General Reasoning Visual & Geometric Reasoning

Citation

Agent0

@article{xia2025agent0,
  title={Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning},
  author={Xia, Peng and Zeng, Kaide and Liu, Jiaqi and Qin, Can and Wu, Fang and Zhou, Yiyang and Xiong, Caiming and Yao, Huaxiu},
  journal={arXiv preprint arXiv:2511.16043},
  year={2025}
}

Agent0-VL

@article{liu2025agent0vl,
  title={Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning},
  author={Liu, Jiaqi and Xiong, Kaiwen and Xia, Peng and Zhou, Yiyang and Ji, Haonian and Feng, Lu and Han, Siwei and Ding, Mingyu and Yao, Huaxiu},
  journal={arXiv preprint arXiv:2511.19900},
  year={2025}
}