Meta Unveils KernelEvolve: AI Agent Automates Chip Optimization, Boosts Model Performance by 60%
By
<h2>Breaking News: Meta's Autonomous Agent Revolutionizes AI Hardware Efficiency</h2>
<p>Meta has introduced <strong>KernelEvolve</strong>, an autonomous AI agent that writes and optimizes low-level kernel code for its diverse fleet of chips—NVIDIA GPUs, AMD GPUs, custom MTIA silicon, and CPUs. The system, now part of Meta's Ranking Engineer Agent, slashes weeks of human expert work into hours, delivering <strong>over 60% inference throughput improvement</strong> for the Andromeda Ads model on NVIDIA GPUs and <strong>more than 25% training throughput gain</strong> on Meta's custom MTIA chips.</p><figure style="margin:20px 0"><img src="https://engineering.fb.com/wp-content/uploads/2026/04/Meta-KernalEvolve-REA-Hero.png" alt="Meta Unveils KernelEvolve: AI Agent Automates Chip Optimization, Boosts Model Performance by 60%" style="width:100%;height:auto;border-radius:8px" loading="lazy"><figcaption style="font-size:12px;color:#666;margin-top:5px">Source: engineering.fb.com</figcaption></figure>
<p>“KernelEvolve treats kernel optimization as a search problem,” a Meta spokesperson told reporters. “A job-harness evaluates each candidate kernel, feeds diagnostics back to the LLM, and drives a continuous search over hundreds of alternatives—often exceeding human-expert performance.” The breakthrough addresses the exponential complexity of supporting multiple hardware generations and model architectures, which had outpaced manual tuning.</p>
<h3 id='background'>Background: The Kernel Bottleneck</h3>
<p>Meta operates a massive, heterogeneous infrastructure. Each new chip generation and ML model requires hand-crafted code—called kernels—that translates high-level operations into chip-specific instructions. Standard operators like matrix multiplications are covered by vendor libraries, but production workloads need many custom operators.</p>
<p>“The number of models multiplied by the number of hardware types and generations created a scalability crisis,” explained the spokesperson. “Hand-tuning by kernel experts simply doesn't scale.” KernelEvolve automates this process, compressing weeks of profiling, optimizing, and cross-hardware debugging into hours of automated search and evaluation.</p>
<h3 id='what-this-means'>What This Means: Faster Innovation, Lower Costs</h3>
<p>By automating kernel authoring, Meta frees engineers to focus on higher-level model design rather than manual optimization. The system works across public and proprietary hardware, generating code in high-level DSLs like Triton, Cute DSL, and FlyDSL, as well as low-level languages including CUDA, HIP, and MTIA C++.</p><figure style="margin:20px 0"><img src="https://engineering.fb.com/wp-content/uploads/2026/04/Meta-KernalEvolve-REA-image1.png" alt="Meta Unveils KernelEvolve: AI Agent Automates Chip Optimization, Boosts Model Performance by 60%" style="width:100%;height:auto;border-radius:8px" loading="lazy"><figcaption style="font-size:12px;color:#666;margin-top:5px">Source: engineering.fb.com</figcaption></figure>
<p>The performance gains translate directly into more efficient AI services—from personalized recommendations to generative AI assistants—served to billions of users every day. Full details appear in the paper, “KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta,” set for the 53rd International Symposium on Computer Architecture (ISCA) 2026.</p>
<h3>How It Works: An Agentic Search</h3>
<p>KernelEvolve uses a large language model (LLM) as its core engine. It proposes candidate kernel implementations, which are then compiled and run on actual hardware. Diagnostic profiles are fed back to the LLM, which iterates to improve performance.</p>
<p>This <a href='#background'>agentic approach</a> has already demonstrated results that rival or surpass human experts. For example, over 60% inference throughput improvement for the Andromeda Ads model on NVIDIA GPUs and over 25% training throughput improvement for an ads model on Meta's custom MTIA chips.</p>
<p><em>For more on Meta's autonomous AI capabilities, see the <a href='https://example.com/ranking-engineer-agent'>Ranking Engineer Agent blog series</a>.</em></p>
Tags: