JetStream 3: A New Era for Browser Benchmarking

JetStream 3.0 represents a significant milestone in browser performance testing, developed collaboratively by Google, Mozilla, and WebKit. This refresh addresses the evolving needs of modern web applications, particularly around WebAssembly (Wasm) and real-world scalability. Below, we explore the key questions about this benchmark suite and the engineering insights behind it.

What is JetStream 3 and why was it created?
How does JetStream 3 improve WebAssembly benchmarking?
What was the 'infinity problem' in JetStream 2?
How did browser engines outgrow JetStream 2's Wasm tests?
What changes were made to the scoring formula in JetStream 3?
How do modern web applications influence JetStream 3's design?
What role did WebKit's JavaScriptCore team play in these improvements?

What is JetStream 3 and why was it created?

JetStream 3 is a major update to the cross-browser benchmark suite, developed jointly by Google, Mozilla, and WebKit. It was created because previous benchmarks—like JetStream 2—had become outdated as web technologies evolved. The original suite no longer reflected modern best practices or the complexity of today's web applications. Moreover, browser engines had optimized every accessible aspect of JetStream 2, leading to diminishing returns. The new suite represents a fundamental shift in performance measurement, focusing on WebAssembly and large-scale application patterns. By refreshing the workloads and scoring methodology, JetStream 3 provides a more accurate gauge of real-world browsing performance, helping developers drive meaningful optimizations across all major browser engines.

JetStream 3: A New Era for Browser Benchmarking — Source: webkit.org

How does JetStream 3 improve WebAssembly benchmarking?

JetStream 3 overhauls how WebAssembly (Wasm) workloads are measured. In JetStream 2, Wasm scoring was split into Startup and Runtime phases, reflecting the early assumption that users tolerated long startup times for high throughput. However, modern Wasm usage is far more varied—it appears in libraries, image decoders, and UI frameworks where fast startup is critical. JetStream 3 eliminates this artificial separation, integrating Wasm benchmarks that mimic real-world scenarios. It also uses high-resolution timers to capture sub-millisecond startup times accurately. This ensures that optimizations benefiting both small and large Wasm modules are fairly rewarded, preventing edge cases like the infamous “infinity problem” from distorting scores.

What was the 'infinity problem' in JetStream 2?

The “infinity problem” arose from JetStream 2’s scoring formula: Score = 5000 / Time. When a Wasm benchmark’s startup time dropped below 1 millisecond, Date.now() rounded it to 0 ms, causing a division by zero and an infinite score. This happened because browser engines optimized Wasm instantiation so aggressively that some workloads effectively finished instantly. While getting an infinite score sounds like a win, it rendered the metric meaningless—a single subtest could overwhelm all others. JetStream 2.2 had to patch the harness by clamping the score at 5000, but this was a stopgap. The problem highlighted that the benchmark had been outgrown: micro-optimizations on tiny workloads no longer reflected real browsing.

How did browser engines outgrow JetStream 2's Wasm tests?

Over the years, browser engines like WebKit’s JavaScriptCore dramatically reduced Wasm module instantiation times. In JetStream 2, the startup phase was initially designed for large C/C++ applications where a few hundred milliseconds were acceptable. But engines soon optimized even the cheapest workloads—shaving 0.1 ms from a 2 ms startup is a 5% gain, whereas the same effort on a 100 ms workload was noise. As a result, engines reached near-zero startup for small tests. This evolution meant that JetStream 2’s Wasm subtests no longer challenged modern engines; they became too easy. A benchmark must stress the system to drive improvements; when it doesn’t, optimizations become overly specific to the test rather than general. JetStream 3 replaces these outdated tests with more demanding and diverse scenarios.

What changes were made to the scoring formula in JetStream 3?

JetStream 3 abandons the simple 5000 / Time formula that caused the infinity problem. Instead, it adopts a geometric mean of normalized scores, similar to other modern benchmarks like Speedometer. Each subtest’s result is compared to a fixed reference machine, yielding a ratio. This prevents any single subtest from dominating the overall score—even if a benchmark runs in under a millisecond, its contribution remains bounded. Additionally, the suite uses high-resolution timers (e.g., performance.now()) to capture sub-millisecond differences accurately. The new scoring is more robust and fair, ensuring that incremental performance gains in any area are reflected proportionally. This change also aligns with how real users perceive performance: small improvements matter, but not at the expense of distorting the aggregate result.

How do modern web applications influence JetStream 3's design?

Modern web apps are far larger and more complex than those of the JetStream 2 era. They often rely on Wasm for critical tasks like image decoding, video processing, and UI rendering—functions where startup speed directly impacts user experience. JetStream 3 was designed with this scale in mind: its workloads mimic real applications with multiple modules, dynamic imports, and mixed JavaScript/Wasm code paths. The suite also increases the total number of subtests and their size, ensuring that benchmarks reflect the compound performance of an engine, not just isolated micro-tasks. By aligning with real-world usage patterns, JetStream 3 helps developers optimize for scenarios that genuinely matter to users—like loading a responsive photo editor or a collaborative document viewer.

What role did WebKit's JavaScriptCore team play in these improvements?

The WebKit team, particularly those working on JavaScriptCore, contributed deep engineering insights to JetStream 3. They identified the infinity problem and pushed for a fundamental rethinking of Wasm benchmarking. Their experience optimizing startup paths—achieving near-zero instantiation times for small modules—showed that JetStream 2’s tests were no longer discriminative. The team also helped design new Wasm workloads that stress modern compilation pipelines and garbage collection interactions. Their collaborative work with Google and Mozilla ensured that the suite is fair across all engines, balancing workloads to avoid favoring any single implementation. JavaScriptCore’s own performance improvements during JetStream 3 development (e.g., faster module validation and tiered compilation) are now more accurately measured, driving further refinements that benefit all WebKit users.

Tags: