Skip to content

Configuration

Simulations are controlled entirely by a single JSON file. The file has two top-level keys:

Config structure
{
  "shared_settings": { ... },
  "simulation_strategies": [
    { ... },
    { ... }
  ]
}

Shared settings + per-strategy overrides

shared_settings provides defaults for every strategy. Each entry in simulation_strategies can override any field. This makes it easy to sweep a single parameter across multiple runs in one file.


Example

Minimal multi-strategy example
{
  "shared_settings": {
    "aggregation_strategy_keyword": "pid_standardized",
    "dataset_keyword": "femnist_iid",
    "num_of_rounds": 10,
    "num_of_clients": 5,
    "num_of_malicious_clients": 1,
    "training_device": "cpu",
    "cpus_per_client": 1,
    "gpus_per_client": 0.0,
    "batch_size": 20,
    "num_of_client_epochs": 1,
    "save_plots": true,
    "save_csv": true,
    "Kp": 1, "Ki": 0.05, "Kd": 0.05
  },
  "simulation_strategies": [
    { "num_std_dev": 2.33 },
    { "num_std_dev": 3.0 }
  ]
}

This runs two strategies back-to-back, differing only in num_std_dev. See config/simulation_strategies/example_strategy_config.json for a full example with all 11 attack types.


Research Integrity & Validation

InteFL enforces a Scientific Integrity First policy. To ensure experimental transparency and reproducibility (aligned with IEEE Std and NeurIPS/ICML checklists), the framework uses a "fail-fast" validation approach: it rejects incompatible or mathematically unsound configurations rather than attempting to silently auto-correct them.

Key Constraints

  • Participation vs. Removal: You cannot set remove_clients: true if your configuration requires 100% participation (e.g., min_fit_clients == num_of_clients). It is mathematically impossible to maintain full participation if clients are being permanently excluded.
  • Byzantine Tolerance Bounds: robust aggregation algorithms have strict breakdown points.
    • Krum / Multi-Krum: Requires \(n > 2f + 2\) (where \(n\) is total clients and \(f\) is malicious clients).
    • Trimmed Mean: Requires trim_ratio < 0.5.
    • Violating these bounds removes the theoretical guarantees of the defense and leads to undefined behavior.

Field reference

Core simulation

Field Type Default Description
display_name string null Optional human-readable label shown in the UI.
dataset_keyword string Which dataset to use. See Datasets.
aggregation_strategy_keyword string Which strategy. See Strategies.
num_of_rounds int Total FL rounds to run.
num_of_clients int Total virtual clients.
num_of_malicious_clients int 1 How many clients are treated as potentially malicious.
training_device string "cpu", "cuda", or "gpu" (alias for cuda).
cpus_per_client float CPU cores allocated to each Ray worker.
gpus_per_client float GPU fraction allocated to each Ray worker (0.01.0).
model_type string "cnn" "cnn" or "transformer".
use_llm bool false Enable transformer-based training path.
strict_mode bool null Enable strict validation that rejects incompatible configs (e.g., full participation + client removal).

Training

Field Type Default Description
num_of_client_epochs int Local training epochs per round per client.
batch_size int Mini-batch size for local training.
learning_rate float null Override default learning rate.
training_subset_fraction float Fraction of each client's data used for training (0.01.0).

Client selection

Field Type Default Description
min_fit_clients int Minimum clients that must participate in training each round.
min_evaluate_clients int Minimum clients for evaluation each round.
min_available_clients int Minimum clients that must be available before a round starts.
evaluate_metrics_aggregation_fn string null Set to "weighted_average" to aggregate evaluation metrics.

Attack schedule

See the Attacks page for full documentation of all 11 attack types (data and model poisoning).

Field Type Default Description
attack_schedule array [] List of attack entries. Each entry specifies start/end rounds, attack type, client selection strategy, and attack-specific parameters. Required in config (use [] for no attacks).

Client removal

Field Type Default Description
remove_clients bool false Enable permanent removal of detected malicious clients.
begin_removing_from_round int 0 Only start removing from this round onwards.
termination_policy string "graceful" "strict", "graceful", or "adaptive". Controls behaviour when too many clients have been removed.
min_clients_ratio float 0.3 For "adaptive" policy: stop removing if fewer than this fraction of clients remain.

Output

Field Type Default Description
save_csv bool Save per-round metrics to CSV.
save_plots bool Save matplotlib figures to PDF.
show_plots bool Display plots interactively (not suitable for headless/server runs).
preserve_dataset bool Keep partitioned dataset files after the simulation.
save_attack_snapshots bool false Save before/after data snapshots for attacked clients.
attack_snapshot_format string "pickle" "pickle", "visual", or "pickle_and_visual".
snapshot_max_samples int 6 Max samples included in each snapshot.

Strategy-specific parameters

Trust-based

Field Type Description
trust_threshold float Clients below this trust score are considered malicious.
beta_value float EMA decay for updating trust scores.

PID-based

Field Type Description
Kp float Proportional gain.
Ki float Integral gain.
Kd float Derivative gain.
num_std_dev float Threshold (in std devs) for flagging outlier clients.

Krum / Multi-Krum / Bulyan

Field Type Description
num_krum_selections int Number of clients selected by Krum per round.

Trimmed Mean

Field Type Description
trim_ratio float Fraction of extreme updates trimmed from each end.

General

Field Type Default Description
num_of_clusters int null Reserved. Number of strategy clusters (currently capped at 1).

HuggingFace / custom text datasets

Field Type Default Description
hf_dataset_name string null HuggingFace dataset path to load dynamically (e.g. "ylecun/mnist").
partitioning_strategy string null Partitioning method: "iid", "dirichlet", or "pathological". See Datasets.
partitioning_params object null Parameters for the chosen strategy (e.g. {"alpha": 0.5} for dirichlet, {"num_classes_per_partition": 2} for pathological).
text_column string null Name of the text/input column in the dataset.
label_column string null Name of the label column in the dataset.

LLM / transformer options

Field Type Default Description
llm_model string HuggingFace model ID, e.g. "microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext".
llm_task string "mlm" (masked language modelling) or "classification".
llm_chunk_size int 512 Token sequence length.
max_seq_length int null Maximum sequence length for the tokeniser (overrides llm_chunk_size when set).
mlm_probability float 0.15 Fraction of tokens masked during MLM.
llm_finetuning string null "lora" to use LoRA adapters instead of full fine-tuning.
use_lora bool false Alternative boolean flag to enable LoRA (equivalent to llm_finetuning: "lora").
lora_rank int 8 LoRA rank r.
lora_alpha int 16 LoRA scaling factor (alpha / rank controls adaptation strength).
lora_dropout float 0.05 Dropout applied to LoRA layers.
lora_target_modules array null List of module names to apply LoRA to (e.g. ["query", "value"]). Defaults to model-specific modules.