Calibration Metrics

How well do our predictions match reality?

0.050
Brier Score
Lower is better (0 = perfect)
0.226
Log Loss
Cross-entropy measure
0.80
AUC-ROC
Discrimination (1.0 = perfect)
480
Resolved
Scored predictions

By Domain

DomainBrierLog LossAUC-ROCN
Caucasus0.0370.1950.7848
Iran0.0820.3110.5848
Israel_Lebanon0.0670.2730.9248
Korean_Peninsula0.0190.1450.5048
Red_Sea0.0150.1270.5048
Sahel0.0400.2090.3548
South_China_Sea0.0830.3140.7548
Taiwan_Strait0.0190.1440.5048
Ukraine0.0650.2650.8348
Venezuela0.0680.2780.7048

Understanding the Metrics

Brier Score

Measures the mean squared difference between predicted probabilities and actual outcomes. Ranges from 0 (perfect) to 1 (worst). A climatological baseline (always predicting the base rate) typically scores around 0.25. Scores below 0.1 indicate strong calibration.

Log Loss (Cross-Entropy)

Penalizes confident wrong predictions more heavily than the Brier score. A prediction of 95% for an event that does not occur is punished severely. Lower values indicate better probabilistic discrimination.

AUC-ROC

Area Under the Receiver Operating Characteristic curve. Measures the model's ability to discriminate between positive and negative outcomes regardless of the chosen threshold. 0.5 = random guessing, 1.0 = perfect discrimination.

Resolved Predictions

The number of forecasts that have reached their resolution date and been scored against ground truth. Calibration metrics are only meaningful with a sufficient sample size (N > 30).