Automated Failure Attribution in Multi-Agent Systems: A New Benchmark and Methods

In the rapidly evolving field of LLM-driven multi-agent systems, diagnosing failures has become a major bottleneck. These systems often fail despite busy collaboration, leaving developers to manually comb through logs to identify which agent caused the issue and when. To solve this, researchers from Penn State University, Duke University, and collaborators (including Google DeepMind) have introduced Automated Failure Attribution—a novel research problem. They created the Who&When dataset and evaluated several automated attribution methods. Their work, accepted as a Spotlight at ICML 2025, is now open-source. Below, we answer key questions about this breakthrough.

What is automated failure attribution and why does it matter?

Automated failure attribution is the task of automatically identifying which agent in a multi-agent system caused a failure and at what point in the interaction it occurred. This matters because today's LLM multi-agent systems are fragile: a single miscommunication or mistake by one agent can derail the entire task. Developers currently rely on manual log inspection, which is time-consuming and requires deep expertise. Without automation, iterating on these systems becomes extremely slow. Automated attribution provides a scalable way to pinpoint root causes, accelerating debugging and enabling more reliable multi-agent collaboration.

Automated Failure Attribution in Multi-Agent Systems: A New Benchmark and Methods — Source: syncedreview.com

What is the Who&When dataset?

Who&When is the first benchmark dataset specifically designed for automated failure attribution in LLM multi-agent systems. It contains thousands of failure instances from diverse tasks, each annotated with the ground-truth responsible agent and the failure time step. The dataset covers various types of failures, such as incorrect reasoning, miscommunication between agents, and incomplete information transmission. By providing this standardized evaluation, Who&When enables fair comparison of different attribution methods and drives progress in the field. The dataset is fully open-source on Hugging Face.

What methods did the researchers develop and evaluate?

The team developed several automated attribution methods, ranging from simple heuristic baselines to more sophisticated approaches leveraging causal inference and attention analysis. They evaluated these methods on the Who&When dataset to measure accuracy in pinpointing the failing agent and failure time. Key approaches include log-based sequence matching, prompt-based attribution using LLMs, and a novel method that traces information flow across agents. The results showed that while some methods perform reasonably, the task remains challenging, highlighting room for further innovation. All code is available on GitHub.

How do these methods compare to manual debugging?

Manual debugging of multi-agent failures is akin to finding a needle in a haystack. Developers must read through long interaction logs, relying on their understanding of the system and the task. This process is not only slow but also error-prone. The automated methods proposed by the researchers offer a much faster alternative. For example, the best-performing method can pinpoint the responsible agent with over 70% accuracy on the benchmark, drastically reducing human effort. However, manual inspection still outperforms on subtle edge cases, so the ideal workflow likely combines automation with human oversight.

What types of failures are most common in multi-agent systems?

Based on the Who&When dataset and the researchers' analysis, the most common failures fall into three categories: individual agent errors (e.g., an agent misinterprets instructions), inter-agent miscommunication (e.g., two agents have inconsistent beliefs), and chain breakdowns (e.g., a failure early in the workflow cascades). Interestingly, the dataset reveals that failures often occur due to subtle misunderstandings rather than outright errors, making attribution particularly hard. This insight helps developers prioritize robustness improvements in communication protocols.

How does this research impact real-world applications?

Multi-agent systems are used in code generation, customer support, autonomous planning, and scientific research. When these systems fail in production, quick debugging is critical. Automated failure attribution can reduce downtime and improve user trust. For instance, in a multi-agent coding assistant, if a buggy code block is generated, attribution can tell developers which agent's reasoning led to the mistake. This targeted feedback loop speeds up system improvement and reduces reliance on trial-and-error. The open-source nature of the dataset and code also allows companies to build their own attribution pipelines.

What are the next steps for this line of research?

The researchers plan to extend automated failure attribution to more complex, dynamic, and long-horizon multi-agent scenarios. They also aim to integrate attribution into the training loop, so agents learn to self-correct based on failure signals. Another direction is real-time attribution, where failures are caught mid-execution rather than after the fact. The paper's acceptance at ICML 2025 as a Spotlight indicates the community's strong interest. With the open-source release of code and data, they expect many follow-up works exploring improved methods and broader applications.

Tags: