Byzantine-FL Attack Arena Leaderboard¶
Final-round test accuracy of each aggregation strategy under the FLPoison canonical attack set, on real MNIST (mean over seeds). Higher is more robust. Data lineage: out/attack_arena/aggregated.csv (regenerate via scripts/dump_attack_arena.py).
Worst-case defender ranking¶
If you must pick one strategy without knowing the attack, this is the order — ranked by each strategy's weakest result across all attacks.
| Rank | Strategy | Worst-case accuracy | Weakest under |
|---|---|---|---|
| 1 | Bulyan | 95.7% | Gaussian (Krum-paper) |
| 2 | MultiKrum | 93.8% | Gaussian (Krum-paper) |
| 3 | Krum | 90.9% | Gaussian (Krum-paper) |
| 4 | FedAvg | 9.8% | Gaussian (Krum-paper) |
| 5 | ArKrum | 9.6% | Fang-Krum (Fang 2020) |
Final accuracy by attack¶
| Strategy | Gaussian (Krum-paper) | IPM (Fall of Empires) | Label flip (Tolpegin 2020) | Sign flip (Damaskinos 2018) | ALIE (Baruch 2019) | Fang-Krum (Fang 2020) |
|---|---|---|---|---|---|---|
| FedAvg | 9.8% | 89.3% | 93.3% | 86.2% | 96.5% | 13.4% |
| Krum | 90.9% | 90.9% | 90.9% | 90.9% | 96.3% | 90.9% |
| MultiKrum | 93.8% | 93.8% | 93.8% | 93.8% | 95.7% | 93.8% |
| Bulyan | 95.7% | 95.7% | 95.7% | 95.7% | 96.0% | 95.7% |
| ArKrum | 96.2% | 95.8% | 96.2% | 94.5% | 96.4% | 9.6% |