Data Fusion
6+ heterogeneous sources
Logicon ingests structured data from multiple independent, authoritative sources spanning different domains. Each source covers a distinct signal type — conflict events, macroeconomic indicators, governance quality, sanctions, and media intensity — ensuring no single data provider can create blind spots.
Armed Conflict Location & Event Data — real-time conflict events with sub-national geo-coding
Uppsala Conflict Data Program — battle deaths, state-based and non-state conflicts since 1946
Global Database of Events, Language, and Tone — media-derived event records and sentiment at 15-minute resolution
Federal Reserve Economic Data — 800,000+ macroeconomic and financial time series
Consolidated sanctions, PEP, and debarment lists from 60+ regulatory authorities
Varieties of Democracy and World Governance Indicators — institutional quality, rule of law, corruption indices
Feature Extraction
18 features across 4 domains
Raw data is transformed into a fixed-length feature vector that captures the essential dynamics relevant to each prediction question. Features are grouped into four domains to ensure coverage across all dimensions of geopolitical risk.
Event counts, fatality rates, intensity trends, geographic spread, actor fragmentation
GDELT tone, media volume, Goldstein scale, event diversity, narrative framing shifts
VIX levels, yield curve slope, commodity price shocks, capital flow reversals, currency volatility
Governance indices, regime type, ethnic fractionalization, resource dependence, neighbourhood instability
Ensemble Model
60% logistic regression + 40% decision stump forest
Predictions are generated by a weighted ensemble of two complementary model families. Logistic regression provides stable, interpretable baselines with well-understood uncertainty. The decision stump forest captures non-linear threshold effects — such as conflict intensity tipping points — that linear models miss. The 60/40 weighting balances robustness against expressiveness.
- Logistic regression: 14 features, L2 regularisation, fitted via IRLS
- Decision stump forest: 50 stumps, each splitting on a single feature threshold
- Ensemble weight: 0.60 logistic + 0.40 stump forest
- Output: raw probability estimate before calibration
Calibration
Isotonic regression (PAV algorithm)
Raw model outputs are not well-calibrated — a predicted 0.70 may correspond to a true event rate of 0.62 or 0.78. Isotonic calibration applies the Pool Adjacent Violators (PAV) algorithm to map raw scores to empirically calibrated probabilities. This ensures that when Logicon says P = 0.35, roughly 35% of such predictions resolve positively.
- PAV: monotone non-decreasing step function fitted to historical (score, outcome) pairs
- Confidence intervals via bootstrap resampling (1000 iterations)
- Calibration quality measured by Brier score, log loss, and reliability diagrams
- Recalibrated automatically when drift is detected
Self-Learning Pipeline
7 autonomous stages
Logicon does not require manual retraining. When sufficient new outcomes accumulate or calibration drift is detected, a fully autonomous 7-stage pipeline activates. Each stage has explicit pass/fail criteria — if any stage fails, the current production model continues unchanged.
Audit Trail
Complete reproducibility chain
Every prediction is stored alongside its complete computational provenance. This enables any prediction to be independently verified, reproduced, or challenged — a critical requirement for decision support in high-stakes environments.
- Input snapshot hash (SHA-256) — cryptographic fingerprint of all input data at prediction time
- Feature vector — full 18-dimensional vector stored as JSON for exact reproducibility
- Model version — parameter set ID linking to exact weights, thresholds, and calibration map
- Evidence chain — ranked list of contributing data points with polarity and weight
- Reasoning trace — natural language explanation of key drivers
See the methodology in action: