How to Evaluate Extreme Weather Forecasts: Why Traditional Models Still Outperform AI

By

Introduction

Extreme weather events—record-breaking heatwaves, cold snaps, and storms—cause hundreds of billions of dollars in damages annually. Early warning systems save lives, but choosing the right forecasting model is critical. While artificial intelligence (AI) models have revolutionized weather prediction for routine forecasts, a recent study in Science Advances reveals that they still fall short for extreme, record-breaking events. This guide explains step by step how to assess and compare physics-based (traditional) and AI-based weather models for forecasting the most severe weather. By the end, you'll understand why traditional models remain essential and how to avoid over-relying on AI for extremes.

How to Evaluate Extreme Weather Forecasts: Why Traditional Models Still Outperform AI
Source: www.carbonbrief.org

What You Need

Step-by-Step Guide

Step 1: Understand the Two Model Types

Before comparing, know the fundamental difference:

This distinction explains why AI models can struggle with events that are rare or outside their training range. As study author Prof Sebastian Engelke notes, AI models are “relatively constrained to the range of this dataset.”

Step 2: Identify Record-Breaking Extreme Events

Select a set of record-breaking events from a chosen period (e.g., 2018 and 2020 as in the study). These should be extreme in temperature (hot and cold) and wind. Ensure the events are indeed record-breakers—exceeding previous maxima or minima in the historical record. This dataset will be your benchmark.

Step 3: Obtain Forecasts from Both Model Types

Run or retrieve forecasts from:

Ensure the forecasts cover the exact same time window as the identified extreme events. Record the predicted values for temperature and wind speed at relevant locations.

Step 4: Compare Frequency of Predicted Extremes

Count how many of the record-breaking events each model correctly forecasts. For any given threshold (e.g., temperature above the previous record), determine:

The study found that AI models underestimate the frequency of record-breaking events—they miss many that physics models capture. If your analysis shows AI missing a larger proportion, that’s a red flag.

Step 5: Compare Intensity of Predicted Extremes

For the events that both models do predict, compare the magnitude. Compute the difference between the forecast value and the actual observed value for each extreme event. Physics-based models tend to produce more accurate intensities (closer to the record). AI models often forecast values that are too low, underestimating the severity.

How to Evaluate Extreme Weather Forecasts: Why Traditional Models Still Outperform AI
Source: www.carbonbrief.org

Step 6: Evaluate Dependence on Training Data

AI models are only as good as the data they learn from. If an extreme event is unprecedented—never seen in the training set—the model cannot extrapolate. Check the historical training data period for the AI model. If it covers only recent decades, it may lack examples of rare extremes. Physics models, grounded in physical laws, can simulate novel combinations of conditions.

Step 7: Assess Overall Performance for Routine vs. Extreme Forecasts

While AI may excel for everyday weather (e.g., temperature, precipitation patterns), this guide focuses on extremes. Create a performance matrix:

The study labels this AI weakness a “warning shot” against replacing traditional models too quickly. In your analysis, weight the importance of extreme events based on your application (public safety, infrastructure planning).

Step 8: Consider a Hybrid Approach

Given the complementary strengths, use both models together. For example, run an AI model for rapid, low-cost forecasts under typical conditions, but switch to physics-based for extreme event warnings. Some operational centers are already blending outputs. Test a hybrid ensemble and see if it improves prediction of both frequency and intensity of record-breakers.

Tips for Success

Remember: The goal is not to declare AI as useless for extreme weather, but to use it wisely. By following these steps, you can make informed decisions about which model to trust when lives and property are at stake.

Tags:

Related Articles

Recommended

Discover More

How to Restore Memory in Alzheimer’s by Targeting the PTP1B Protein: A Research RoadmapTurboQuant: Google's New Approach to Efficient KV Cache Compression for LLMsMay 2026 4K Blu-ray Lineup Revealed: Four Must-See Releases, Including a 'Game-Changing Disc'Meta’s Enhanced Security for Encrypted Backups: A Deep DiveHasbro's Ultimate Grogu: The Most Lifelike Animatronic Collectible Yet