ACTIVE Research

EBM-Market

An energy-based / density-based risk modeling framework that challenges the anomaly=risk assumption in ML: GMM density achieves AUROC 0.931-0.954 vs. best anomaly baseline at 0.802 in finance and traffic domains, with a 1.96x Density Risk Ratio in financial markets. Inverted in MARL recommendation domains (0.59x). Submitted to ECML-PKDD 2026.

status ACTIVE
type Research
stack ['Python' 'scikit-learn (GMM)' 'CatBoost' 'FRED API' 'Yahoo Finance' 'Permutation Tests']

// DESCRIPTION

The Problem: Machine Learning Assumes Anomaly Equals Risk — But Does It?

Virtually every machine learning system for risk prediction — fraud detection, market crash forecasting, traffic incident prediction — is built on the same implicit assumption: unusual events are risky events. If a transaction looks different from normal transactions, flag it as fraud. If a market day looks different from normal market days, treat it as elevated risk. If a traffic pattern deviates from baseline, predict an incident. This anomaly-equals-risk assumption is so pervasive that it is rarely stated explicitly, let alone tested empirically.

EBM-Market challenges this assumption head-on. The motivating observation is this: in financial markets, the highest-density trading days — days that look most 'normal' — frequently precede major crashes. Herding behavior causes markets to appear extremely stable and 'normal' in the weeks before a collapse. Anomaly-based detectors would flag these pre-crash days as low risk precisely because they look like everything else. Yet empirically, these are the most dangerous days in the dataset. If this is true — if density and risk are positively correlated in some domains — then the entire anomaly-detection paradigm for risk prediction is backwards for those domains.

问题:机器学习假设异常等于风险——但真的如此吗?
几乎所有风险预测ML系统都建立在同一隐含假设上:不寻常事件就是风险事件。这一假设如此普遍,以至于鲜少被明确陈述,更少被实证检验。EBM-Market正面挑战这一假设。核心观察:在金融市场中,看起来最“正常”的高密度交易日往往是重大崩盘前的征兆——羊群效应使市场在崩溃前数周显得极度稳定。异常检测器会将这些崩盘前的日子标记为低风险,恰恰因为它们看起来和其他日子一样。如果密度与风险在某些领域正相关,那么整个异常检测范式对那些领域来说就是颠倒的。

Situation: Three Domains, Three Different Risk Topologies

EBM-Market tests the density-risk hypothesis across three empirically distinct domains:

  • Financial markets: daily OHLCV data from FRED and Yahoo Finance, covering equities, bonds, and cryptocurrency markets across multiple market regimes
  • Traffic flow: inD dataset (intersection drone footage), covering vehicle trajectory density and incident occurrence at monitored intersections
  • Multi-agent reinforcement learning: SMAC v2 (StarCraft Multi-Agent Challenge), measuring coordination breakdown risk as a function of agent state density

These three domains were chosen because they represent qualitatively different risk topologies: financial markets are hypothesized to show density=risk (herding), traffic is hypothesized to show a mixed signal, and MARL recommendations are hypothesized to invert the relationship.

场景:三个领域,三种不同的风险拓扑
EBM-Market在三个实证截然不同的领域检验密度-风险假设:金融市场(FRED/Yahoo Finance每日OHLCV数据,涵盖股票、债券和加密货币)、交通流(inD数据集,交叉路口无人机轨迹)、多智能体强化学习(SMAC v2,协调崩溃风险作为智能体状态密度的函数)。这三个领域代表质量上不同的风险拓扑,分别假设呈现密度=风险、混合信号和关系反转。

Task: Formally Test Whether Density Predicts Risk Better Than Anomaly

The research question is formalized as: does a density-based risk score (derived from a fitted probability density model) outperform anomaly-based risk scores (isolation forest, one-class SVM, autoencoder reconstruction error) on the task of predicting adverse outcomes? The comparison must be statistically rigorous, accounting for temporal autocorrelation, regime changes, and the look-ahead bias that plagues financial ML evaluation.

任务:正式检验密度是否比异常更能预测风险
研究问题形式化为:从拟合概率密度模型导出的密度风险分数,是否优于基于异常的风险分数(隔离森林、单类SVM、自编码器重建误差)在预测不利结果方面?比较必须统计上严格,考虑时序自相关、机制变化和金融ML评估中的前瞻偏差。

Action: GMM Density + 10K Permutation Tests + Walk-Forward Evaluation

The density model uses a Gaussian Mixture Model (GMM) fitted to the feature representation of each domain. The number of mixture components is selected via BIC on held-out data. The fitted GMM density score is then used directly as a risk signal: high density = high risk (for domains where the hypothesis holds) or low density = high risk (for domains where it inverts).

Statistical validation uses 10,000 permutation tests on the density-risk correlation, breaking temporal structure and resampling to establish null distributions. Walk-forward cross-validation (expanding window, no data leakage) is used for financial evaluation to approximate real deployment conditions. The Density Risk Ratio (DRR) is introduced as a summary statistic: DRR > 1 indicates density-risk alignment (high density leads to high risk), DRR < 1 indicates inversion, DRR = 1 is null (no relationship).

Density Risk Ratio (DRR) Framework

  Domain            |  DRR   |  Interpretation
  ------------------+--------+----------------------------------
  Finance (equities)|  1.96x |  High density -> 96% more risk
  Traffic (inD)     |  1.14x |  Mild density-risk alignment
  MARL (SMAC v2)    |  0.59x |  INVERTED: density = safety
  ------------------+--------+----------------------------------

  AUROC comparison:
  GMM Density:  0.931 - 0.954  (finance + traffic)
  Best Anomaly: 0.802          (isolation forest)
  Walk-forward: ~0.50 (near-random) -- honest degradation reported

行动:GMM密度 + 10000置换检验 + 滚动前向评估
密度模型使用高斯混合模型(GMM),BIC选择混合成分数,拟合密度分数直接用作风险信号。统计验证采用10000次置换检验,建立零分布。金融评估使用扩展窗口前向交叉验证(无数据泄漏)。引入密度风险比(DRR)作为汇总统计量:DRR>1表示密度-风险对齐,DRR<1表示反转。

Results: AUROC 0.931-0.954 vs. 0.802; Honest Walk-Forward Degradation Reported

  • AUROC 0.931-0.954 for GMM density-based risk in financial and traffic domains, vs. the best anomaly baseline at 0.802 (isolation forest)
  • Density Risk Ratio 1.96x in finance: days in the highest density quartile had 96% higher realized risk than the lowest density quartile
  • Density Risk Ratio 1.14x in traffic: mild but statistically significant density-risk alignment (p < 0.001 by permutation test)
  • INVERTED in MARL recommendations: DRR = 0.59x — high-density agent states are significantly safer, meaning anomaly-based detectors would incorrectly flag safe states as risky and miss genuinely dangerous rare states
  • Walk-forward evaluation degrades to near-random (AUROC ~0.50) — an honest limitation reported transparently: the static density signal does not survive realistic deployment simulation, suggesting caution for live trading
  • 10,000 permutation tests confirm all cross-domain results at p < 0.001

This work has been submitted to ECML-PKDD 2026 (European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases), contributing a new hypothesis about risk topology that challenges the dominant anomaly-detection paradigm and provides domain-specific guidance for practitioners building production risk systems.

结果:AUROC 0.931-0.954 vs. 0.802;诚实的前向滚动退化
关键指标:金融和交通领域GMM密度风险的AUROC为0.931-0.954(最优异常基线为0.802);金融领域DRR=1.96x,交通领域DRR=1.14x,MARL推荐领域DRR=0.59x(关系反转);滚动前向评估诚实地揭示AUROC退化至~0.50(近随机),对实盘交易需谨慎;10000次置换检验所有结果p<0.001。该工作已提交至ECML-PKDD 2026,挑战主流异常检测范式,为生产风险系统提供领域特定指导。

// HIGHLIGHTS

  • Submitted to ECML-PKDD 2026 — challenges the anomaly=risk assumption in ML risk prediction
  • AUROC 0.931–0.954 (GMM density) vs. 0.802 (best anomaly baseline) in finance and traffic
  • Density Risk Ratio 1.96x in financial markets — highest-density days precede crashes (herding effect)
  • Density Risk Ratio 1.14x in traffic (inD dataset) — mild but significant density-risk alignment
  • INVERTED in MARL (DRR = 0.59x) — density signals safety in agent coordination; anomaly detectors are backwards
  • Honest limitation reported: walk-forward evaluation degrades to AUROC ~0.50 — no overfitting to favorable evaluation
  • 10,000 permutation tests, walk-forward cross-validation — rigorous against temporal autocorrelation and look-ahead bias
  • Multi-domain validation: FRED/Yahoo Finance equities+crypto, inD traffic, SMAC v2 MARL across 3 risk topologies