Data Fusion

6+ heterogeneous sources

Logicon ingests structured data from multiple independent, authoritative sources spanning different domains. Each source covers a distinct signal type — conflict events, macroeconomic indicators, governance quality, sanctions, and media intensity — ensuring no single data provider can create blind spots.

ACLED

Armed Conflict Location & Event Data — real-time conflict events with sub-national geo-coding

UCDP

Uppsala Conflict Data Program — battle deaths, state-based and non-state conflicts since 1946

GDELT

Global Database of Events, Language, and Tone — media-derived event records and sentiment at 15-minute resolution

FRED

Federal Reserve Economic Data — 800,000+ macroeconomic and financial time series

OpenSanctions

Consolidated sanctions, PEP, and debarment lists from 60+ regulatory authorities

V-Dem / WGI

Varieties of Democracy and World Governance Indicators — institutional quality, rule of law, corruption indices

Feature Extraction

18 features across 4 domains

Raw data is transformed into a fixed-length feature vector that captures the essential dynamics relevant to each prediction question. Features are grouped into four domains to ensure coverage across all dimensions of geopolitical risk.

Conflict Dynamics

Event counts, fatality rates, intensity trends, geographic spread, actor fragmentation

Information Environment

GDELT tone, media volume, Goldstein scale, event diversity, narrative framing shifts

Financial Stress

VIX levels, yield curve slope, commodity price shocks, capital flow reversals, currency volatility

Structural Vulnerability

Governance indices, regime type, ethnic fractionalization, resource dependence, neighbourhood instability

Ensemble Model

60% logistic regression + 40% decision stump forest

Predictions are generated by a weighted ensemble of two complementary model families. Logistic regression provides stable, interpretable baselines with well-understood uncertainty. The decision stump forest captures non-linear threshold effects — such as conflict intensity tipping points — that linear models miss. The 60/40 weighting balances robustness against expressiveness.

Logistic regression: 14 features, L2 regularisation, fitted via IRLS
Decision stump forest: 50 stumps, each splitting on a single feature threshold
Ensemble weight: 0.60 logistic + 0.40 stump forest
Output: raw probability estimate before calibration

Calibration

Isotonic regression (PAV algorithm)

Raw model outputs are not well-calibrated — a predicted 0.70 may correspond to a true event rate of 0.62 or 0.78. Isotonic calibration applies the Pool Adjacent Violators (PAV) algorithm to map raw scores to empirically calibrated probabilities. This ensures that when Logicon says P = 0.35, roughly 35% of such predictions resolve positively.

PAV: monotone non-decreasing step function fitted to historical (score, outcome) pairs
Confidence intervals via bootstrap resampling (1000 iterations)
Calibration quality measured by Brier score, log loss, and reliability diagrams
Recalibrated automatically when drift is detected

Self-Learning Pipeline

7 autonomous stages

Logicon does not require manual retraining. When sufficient new outcomes accumulate or calibration drift is detected, a fully autonomous 7-stage pipeline activates. Each stage has explicit pass/fail criteria — if any stage fails, the current production model continues unchanged.

Backfill — Collect resolved outcomes and align with historical feature snapshots

Retrain — Fit new model weights on expanded training set

Isotonic PAV — Recalibrate probability mapping with updated outcome data

Walk-Forward — Time-series cross-validation to verify out-of-sample performance

Monte Carlo — Bootstrap fitness testing — new model must beat current on >95% of resampled test sets

Drift Detection — Page-Hinkley test monitors Brier score for statistically significant degradation

Activate — Atomic swap of model weights — old version archived, new version promoted

Audit Trail

Complete reproducibility chain

Every prediction is stored alongside its complete computational provenance. This enables any prediction to be independently verified, reproduced, or challenged — a critical requirement for decision support in high-stakes environments.

Input snapshot hash (SHA-256) — cryptographic fingerprint of all input data at prediction time
Feature vector — full 18-dimensional vector stored as JSON for exact reproducibility
Model version — parameter set ID linking to exact weights, thresholds, and calibration map
Evidence chain — ranked list of contributing data points with polarity and weight
Reasoning trace — natural language explanation of key drivers

See the methodology in action:

Calibration Metrics Live Forecasts Audit Explorer