Home / Projects / Agent-as-a-Judge
Agent-as-a-Judge
Evaluate agents with agents in open-ended settings.
Summary
Agent-as-a-Judge proposes agentic evaluation pipelines where capable models assess other agents on complex tasks. The goal is scalable, lower-cost, and behavior-aware evaluation for fast-moving agent systems.
Key Links
Use the paper as the primary citation, and the repository for implementation details and updates.