How Automating Agent Trajectory Analysis Transformed Our Development Workflow
In the world of AI research, analyzing the performance of coding agents is both critical and time-consuming. I recently found myself caught in a repetitive cycle of reviewing thousands of agent trajectories, each a JSON file documenting an agent's decision-making steps while solving a task. Using GitHub Copilot, I could surface patterns and reduce the workload, but the process still required manual investigation. Driven by a desire to eliminate this intellectual toil, I created eval-agents, a tool that automates the analysis and enables my entire team to collaborate more effectively.
The Impetus for Automation
My primary responsibility involves evaluating coding agent performance against standardized benchmarks like TerminalBench2 and SWEBench-Pro. This requires digging through massive collections of trajectories—detailed logs that capture the agent's thoughts and actions for each task.

Analyzing Agent Trajectories
Each task in a benchmark set produces its own trajectory file, often hundreds of lines of JSON code. Multiply that by dozens of tasks per benchmark and again by the numerous runs we conduct daily, and you end up with hundreds of thousands of lines of data to analyze. Manually reading through all of this is simply impossible.
The Repetitive Loop
My typical workflow involved using GitHub Copilot to identify patterns in the trajectories, then manually investigating those patterns to extract meaningful insights. While Copilot helped me reduce the lines I needed to read from hundreds of thousands to a few hundred, the loop itself remained repetitive. The engineer in me thought: I can automate this. That realization sparked the creation of eval-agents.
Building Eval-Agents
The core idea was to build a system that could automate the intellectual work of analyzing agent trajectories, making it accessible and shareable across the team.
Design Goals
I approached the project with three guiding principles:
- Make agents easy to share and use – so that anyone on the team could leverage the automation.
- Make it easy to author new agents – empowering peers to create custom analysis tools.
- Make coding agents the primary vehicle for contributions – enabling a collaborative, agent-driven development workflow.
Sharing and Collaboration
These goals align closely with GitHub’s core values of collaboration and open source. My experience as an open-source maintainer for the GitHub CLI taught me the importance of making tools easy to adopt and extend. With eval-agents, I ensured that the agents could be version-controlled, shared via repositories, and run by anyone with minimal setup. Team members can now author their own agents to tackle specific analysis challenges, and the entire team benefits from a growing library of automation.

Impact and Future
The results have been transformative. Instead of spending hours on manual pattern hunting, my colleagues and I can now run agents that automatically surface insights from benchmark runs. This has not only accelerated our research but also freed up time for more creative problem-solving.
Moreover, the agent-driven development approach has opened up new possibilities. We are no longer limited by individual capacity; the team collectively builds and maintains agents that continuously improve our analysis capabilities. As we expand the agent library, we anticipate even greater efficiency gains and deeper understanding of coding agent behavior.
This journey taught me that automation isn't just about removing drudgery—it's about enabling teams to collaborate at a higher level. By leveraging tools like GitHub Copilot and building upon them with our own agents, we have created a feedback loop where automation fuels innovation.
Related Articles
- Python Official Blog Relaunched: Now Open to Community Contributions via GitHub
- Securing Your Autonomous AI Agent: A Practical Guide to Safely Deploying Tools Like OpenClaw
- Microsoft Releases Agent Governance Toolkit for .NET to Secure MCP-Based AI Agents
- Building Automated Analysis Pipelines with GitHub Copilot: A Guide to Agent-Driven Development
- Understanding Stack Allocation for Slices in Go
- Structured-Prompt-Driven Development: A Team Approach to AI-Assisted Coding
- Automating Coding Agent Analysis with GitHub Copilot: A Step-by-Step Guide
- Building AI-Powered Java Applications with Spring AI: A Practical Guide