Fraud-Detection-Chain
Hybrid blockchain + ML fraud detection: LightGBM F1=0.911, PR-AUC=0.977, smart contract coverage >95%, real-time decisions under 100ms.
// DESCRIPTION
The Problem: Centralized Fraud Detection Has No Immutable Audit Trail
Enterprise fraud detection systems today are almost entirely centralized: a model scores transactions, a rules engine applies thresholds, and fraud analysts review flagged cases. The entire process exists in mutable databases that internal actors can edit, selectively delete from, or manipulate to cover up both fraud and false accusations. Regulatory compliance audits rely on transaction logs that can be retroactively altered. There is no cryptographic proof that a given transaction was actually evaluated by the model that produced the stated score, or that the score was not modified after the fact. This creates legal liability, enables internal collusion, and undermines the trustworthiness of fraud investigation outcomes.
问题背景:中心化欺诈检测缺乏不可篪改的审计追踪
当今企业欺诈检测系统几乎完全中心化:模型对交易评分,规则引擎应用阈値,欺诈分析师审查被标记案例。整个流程存在于可变数据库中,内部人员可以编辑、选择性删除或篪改。没有密码学证明表明某笔交易确实经过了产生所述评分的模型评估。
Approach: Solidity Smart Contracts + Off-Chain ML Ensemble
The architecture cleanly separates computation from verification. Off-chain ML handles compute-intensive classification; on-chain smart contracts provide the immutable audit record.
On-Chain Layer (Solidity 0.8.20, Hardhat): Four interconnected contracts implement the fraud investigation lifecycle. A Role-Based Access Control (RBAC) contract enforces permissioning: only authorized oracles can submit model scores, only designated analysts can adjudicate. A state machine contract enforces the legal workflow: transactions move through defined states (pending → under-review → resolved → appealed) with cryptographic event logs at each transition. A budget manager contract enforces spending controls and freeze actions. An audit log contract stores Merkle roots of batched decision records.
Off-Chain ML Layer: Four classifiers trained on Kaggle creditcard.csv (284,807 transactions, 492 fraud cases) augmented with synthetic enterprise data: Logistic Regression, Random Forest, XGBoost, LightGBM, and a shallow MLP. SHAP values computed for every prediction for regulatory-grade explainability.
Frontend Layer: React/Next.js analyst dashboard for real-time case management plus a Streamlit dashboard for ML monitoring and SHAP visualization.
研究方法:Solidity 智能合约 + 链下 ML 集成模型
架构清晰地将计算与验证分离:链下 ML 负责计算密集的分类工作,链上智能合约提供不可篪改的审计记录。
四个相互关联的合约实现欺诈调查生命周期:RBAC 合约、状态机合约、预算管理合约、审计日志合约。对每次预测计算 SHAP 値,满足监管等级的可解释性要求。
Results: LightGBM F1=0.911, Real-Time Under 100ms, Submitted to IEEE TIFS + COMPSAC 2026
ML performance: LightGBM achieves F1 = 0.911 and PR-AUC = 0.977 on the held-out test set. PR-AUC is the appropriate metric for this severely imbalanced classification task (fraud cases represent ~0.17% of transactions).
Smart contract coverage: Hardhat test suite achieves >95% branch and statement coverage across all four contracts.
End-to-end latency: From transaction submission to on-chain audit record in under 100ms in the test environment.
Submitted to IEEE Transactions on Information Forensics and Security (TIFS) and COMPSAC 2026.
实验结果:LightGBM F1=0.911,实时响应 <100ms
LightGBM 取得最佳表现:F1 = 0.911,PR-AUC = 0.977。
Hardhat 测试套件在全部四个合约上达到 >95% 分支和语句覆盖率。
已投稿 IEEE TIFS 和 COMPSAC 2026。
// HIGHLIGHTS
- Submitted to IEEE TIFS + COMPSAC 2026 — blockchain audit trail for ML fraud detection
- LightGBM achieves F1 = 0.911 and PR-AUC = 0.977 on Kaggle creditcard.csv
- 4-contract Solidity architecture: RBAC + state machine + budget manager + Merkle audit log
- Smart contract test coverage >95% branch and statement via Hardhat
- SHAP explanations for every prediction — regulatory-grade model interpretability
- End-to-end latency <100ms from transaction to on-chain audit record
- React/Next.js analyst dashboard + Streamlit ML monitoring and SHAP visualization
- Evaluated on 284,807 transactions (492 fraud cases) + synthetic enterprise data augmentation