Methodology

How Logicon generates and validates predictions

1

Data Fusion

6+ heterogeneous sources

Logicon ingests structured data from multiple independent, authoritative sources spanning different domains. Each source covers a distinct signal type — conflict events, macroeconomic indicators, governance quality, sanctions, and media intensity — ensuring no single data provider can create blind spots.

ACLED

Armed Conflict Location & Event Data — real-time conflict events with sub-national geo-coding

UCDP

Uppsala Conflict Data Program — battle deaths, state-based and non-state conflicts since 1946

GDELT

Global Database of Events, Language, and Tone — media-derived event records and sentiment at 15-minute resolution

FRED

Federal Reserve Economic Data — 800,000+ macroeconomic and financial time series

OpenSanctions

Consolidated sanctions, PEP, and debarment lists from 60+ regulatory authorities

V-Dem / WGI

Varieties of Democracy and World Governance Indicators — institutional quality, rule of law, corruption indices

2

Feature Extraction

18 features across 4 domains

Raw data is transformed into a fixed-length feature vector that captures the essential dynamics relevant to each prediction question. Features are grouped into four domains to ensure coverage across all dimensions of geopolitical risk.

Conflict Dynamics

Event counts, fatality rates, intensity trends, geographic spread, actor fragmentation

Information Environment

GDELT tone, media volume, Goldstein scale, event diversity, narrative framing shifts

Financial Stress

VIX levels, yield curve slope, commodity price shocks, capital flow reversals, currency volatility

Structural Vulnerability

Governance indices, regime type, ethnic fractionalization, resource dependence, neighbourhood instability

3

Ensemble Model

60% logistic regression + 40% decision stump forest

Predictions are generated by a weighted ensemble of two complementary model families. Logistic regression provides stable, interpretable baselines with well-understood uncertainty. The decision stump forest captures non-linear threshold effects — such as conflict intensity tipping points — that linear models miss. The 60/40 weighting balances robustness against expressiveness.

  • Logistic regression: 14 features, L2 regularisation, fitted via IRLS
  • Decision stump forest: 50 stumps, each splitting on a single feature threshold
  • Ensemble weight: 0.60 logistic + 0.40 stump forest
  • Output: raw probability estimate before calibration
4

Calibration

Isotonic regression (PAV algorithm)

Raw model outputs are not well-calibrated — a predicted 0.70 may correspond to a true event rate of 0.62 or 0.78. Isotonic calibration applies the Pool Adjacent Violators (PAV) algorithm to map raw scores to empirically calibrated probabilities. This ensures that when Logicon says P = 0.35, roughly 35% of such predictions resolve positively.

  • PAV: monotone non-decreasing step function fitted to historical (score, outcome) pairs
  • Confidence intervals via bootstrap resampling (1000 iterations)
  • Calibration quality measured by Brier score, log loss, and reliability diagrams
  • Recalibrated automatically when drift is detected
5

Self-Learning Pipeline

7 autonomous stages

Logicon does not require manual retraining. When sufficient new outcomes accumulate or calibration drift is detected, a fully autonomous 7-stage pipeline activates. Each stage has explicit pass/fail criteria — if any stage fails, the current production model continues unchanged.

1.
BackfillCollect resolved outcomes and align with historical feature snapshots
2.
RetrainFit new model weights on expanded training set
3.
Isotonic PAVRecalibrate probability mapping with updated outcome data
4.
Walk-ForwardTime-series cross-validation to verify out-of-sample performance
5.
Monte CarloBootstrap fitness testing — new model must beat current on >95% of resampled test sets
6.
Drift DetectionPage-Hinkley test monitors Brier score for statistically significant degradation
7.
ActivateAtomic swap of model weights — old version archived, new version promoted
6

Audit Trail

Complete reproducibility chain

Every prediction is stored alongside its complete computational provenance. This enables any prediction to be independently verified, reproduced, or challenged — a critical requirement for decision support in high-stakes environments.

  • Input snapshot hash (SHA-256) — cryptographic fingerprint of all input data at prediction time
  • Feature vector — full 18-dimensional vector stored as JSON for exact reproducibility
  • Model version — parameter set ID linking to exact weights, thresholds, and calibration map
  • Evidence chain — ranked list of contributing data points with polarity and weight
  • Reasoning trace — natural language explanation of key drivers

See the methodology in action:

Calibration MetricsLive ForecastsAudit Explorer