How well do our predictions match reality?
| Domain | Brier | Log Loss | AUC-ROC | N |
|---|---|---|---|---|
| Caucasus | 0.037 | 0.195 | 0.78 | 48 |
| Iran | 0.082 | 0.311 | 0.58 | 48 |
| Israel_Lebanon | 0.067 | 0.273 | 0.92 | 48 |
| Korean_Peninsula | 0.019 | 0.145 | 0.50 | 48 |
| Red_Sea | 0.015 | 0.127 | 0.50 | 48 |
| Sahel | 0.040 | 0.209 | 0.35 | 48 |
| South_China_Sea | 0.083 | 0.314 | 0.75 | 48 |
| Taiwan_Strait | 0.019 | 0.144 | 0.50 | 48 |
| Ukraine | 0.065 | 0.265 | 0.83 | 48 |
| Venezuela | 0.068 | 0.278 | 0.70 | 48 |
Measures the mean squared difference between predicted probabilities and actual outcomes. Ranges from 0 (perfect) to 1 (worst). A climatological baseline (always predicting the base rate) typically scores around 0.25. Scores below 0.1 indicate strong calibration.
Penalizes confident wrong predictions more heavily than the Brier score. A prediction of 95% for an event that does not occur is punished severely. Lower values indicate better probabilistic discrimination.
Area Under the Receiver Operating Characteristic curve. Measures the model's ability to discriminate between positive and negative outcomes regardless of the chosen threshold. 0.5 = random guessing, 1.0 = perfect discrimination.
The number of forecasts that have reached their resolution date and been scored against ground truth. Calibration metrics are only meaningful with a sufficient sample size (N > 30).