Home / Projects / Agent-as-a-Judge

Agent-as-a-Judge

Evaluate agents with agents in open-ended settings.

Venue: ICML 2025 Area: Agent Evaluation Type: Paper + Code

Summary

Agent-as-a-Judge proposes agentic evaluation pipelines where capable models assess other agents on complex tasks. The goal is scalable, lower-cost, and behavior-aware evaluation for fast-moving agent systems.

Key Links

Paper (arXiv)
Code (GitHub)
BibTeX mirror

Use the paper as the primary citation, and the repository for implementation details and updates.