Configuration
Simulations are controlled entirely by a single JSON file. The file has two top-level keys:
Config structure {
"shared_settings" : { ... },
"simulation_strategies" : [
{ ... },
{ ... }
]
}
Shared settings + per-strategy overrides
shared_settings provides defaults for every strategy. Each entry in simulation_strategies can override any field. This makes it easy to sweep a single parameter across multiple runs in one file.
Example
Minimal multi-strategy example {
"shared_settings" : {
"aggregation_strategy_keyword" : "pid_standardized" ,
"dataset_keyword" : "femnist_iid" ,
"num_of_rounds" : 10 ,
"num_of_clients" : 5 ,
"num_of_malicious_clients" : 1 ,
"training_device" : "cpu" ,
"cpus_per_client" : 1 ,
"gpus_per_client" : 0.0 ,
"batch_size" : 20 ,
"num_of_client_epochs" : 1 ,
"save_plots" : true ,
"save_csv" : true ,
"Kp" : 1 , "Ki" : 0.05 , "Kd" : 0.05
},
"simulation_strategies" : [
{ "num_std_dev" : 2.33 },
{ "num_std_dev" : 3.0 }
]
}
This runs two strategies back-to-back, differing only in num_std_dev. See config/simulation_strategies/example_strategy_config.json for a full example with all 11 attack types.
Research Integrity & Validation
InteFL enforces a Scientific Integrity First policy. To ensure experimental transparency and reproducibility (aligned with IEEE Std and NeurIPS/ICML checklists), the framework uses a "fail-fast" validation approach: it rejects incompatible or mathematically unsound configurations rather than attempting to silently auto-correct them.
Key Constraints
Participation vs. Removal: You cannot set remove_clients: true if your configuration requires 100% participation (e.g., min_fit_clients == num_of_clients). It is mathematically impossible to maintain full participation if clients are being permanently excluded.
Byzantine Tolerance Bounds: robust aggregation algorithms have strict breakdown points.
Krum / Multi-Krum: Requires \(n > 2f + 2\) (where \(n\) is total clients and \(f\) is malicious clients).
Trimmed Mean: Requires trim_ratio < 0.5.
Violating these bounds removes the theoretical guarantees of the defense and leads to undefined behavior.
Field reference
Core simulation
Field
Type
Default
Description
display_name
string
null
Optional human-readable label shown in the UI.
dataset_keyword
string
—
Which dataset to use. See Datasets .
aggregation_strategy_keyword
string
—
Which strategy. See Strategies .
num_of_rounds
int
—
Total FL rounds to run.
num_of_clients
int
—
Total virtual clients.
num_of_malicious_clients
int
1
How many clients are treated as potentially malicious.
training_device
string
—
"cpu", "cuda", or "gpu" (alias for cuda).
cpus_per_client
float
—
CPU cores allocated to each Ray worker.
gpus_per_client
float
—
GPU fraction allocated to each Ray worker (0.0–1.0).
model_type
string
"cnn"
"cnn" or "transformer".
use_llm
bool
false
Enable transformer-based training path.
strict_mode
bool
null
Enable strict validation that rejects incompatible configs (e.g., full participation + client removal).
Training
Field
Type
Default
Description
num_of_client_epochs
int
—
Local training epochs per round per client.
batch_size
int
—
Mini-batch size for local training.
learning_rate
float
null
Override default learning rate.
training_subset_fraction
float
—
Fraction of each client's data used for training (0.0–1.0).
Client selection
Field
Type
Default
Description
min_fit_clients
int
—
Minimum clients that must participate in training each round.
min_evaluate_clients
int
—
Minimum clients for evaluation each round.
min_available_clients
int
—
Minimum clients that must be available before a round starts.
evaluate_metrics_aggregation_fn
string
null
Set to "weighted_average" to aggregate evaluation metrics.
Attack schedule
See the Attacks page for full documentation of all 11 attack types (data and model poisoning).
Field
Type
Default
Description
attack_schedule
array
[]
List of attack entries. Each entry specifies start/end rounds, attack type, client selection strategy, and attack-specific parameters. Required in config (use [] for no attacks).
Client removal
Field
Type
Default
Description
remove_clients
bool
false
Enable permanent removal of detected malicious clients.
begin_removing_from_round
int
0
Only start removing from this round onwards.
termination_policy
string
"graceful"
"strict", "graceful", or "adaptive". Controls behaviour when too many clients have been removed.
min_clients_ratio
float
0.3
For "adaptive" policy: stop removing if fewer than this fraction of clients remain.
Output
Field
Type
Default
Description
save_csv
bool
—
Save per-round metrics to CSV.
save_plots
bool
—
Save matplotlib figures to PDF.
show_plots
bool
—
Display plots interactively (not suitable for headless/server runs).
preserve_dataset
bool
—
Keep partitioned dataset files after the simulation.
save_attack_snapshots
bool
false
Save before/after data snapshots for attacked clients.
attack_snapshot_format
string
"pickle"
"pickle", "visual", or "pickle_and_visual".
snapshot_max_samples
int
6
Max samples included in each snapshot.
Strategy-specific parameters
Trust-based
Field
Type
Description
trust_threshold
float
Clients below this trust score are considered malicious.
beta_value
float
EMA decay for updating trust scores.
PID-based
Field
Type
Description
Kp
float
Proportional gain.
Ki
float
Integral gain.
Kd
float
Derivative gain.
num_std_dev
float
Threshold (in std devs) for flagging outlier clients.
Krum / Multi-Krum / Bulyan
Field
Type
Description
num_krum_selections
int
Number of clients selected by Krum per round.
Trimmed Mean
Field
Type
Description
trim_ratio
float
Fraction of extreme updates trimmed from each end.
General
Field
Type
Default
Description
num_of_clusters
int
null
Reserved. Number of strategy clusters (currently capped at 1).
HuggingFace / custom text datasets
Field
Type
Default
Description
hf_dataset_name
string
null
HuggingFace dataset path to load dynamically (e.g. "ylecun/mnist").
partitioning_strategy
string
null
Partitioning method: "iid", "dirichlet", or "pathological". See Datasets .
partitioning_params
object
null
Parameters for the chosen strategy (e.g. {"alpha": 0.5} for dirichlet, {"num_classes_per_partition": 2} for pathological).
text_column
string
null
Name of the text/input column in the dataset.
label_column
string
null
Name of the label column in the dataset.
Field
Type
Default
Description
llm_model
string
—
HuggingFace model ID, e.g. "microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext".
llm_task
string
—
"mlm" (masked language modelling) or "classification".
llm_chunk_size
int
512
Token sequence length.
max_seq_length
int
null
Maximum sequence length for the tokeniser (overrides llm_chunk_size when set).
mlm_probability
float
0.15
Fraction of tokens masked during MLM.
llm_finetuning
string
null
"lora" to use LoRA adapters instead of full fine-tuning.
use_lora
bool
false
Alternative boolean flag to enable LoRA (equivalent to llm_finetuning: "lora").
lora_rank
int
8
LoRA rank r.
lora_alpha
int
16
LoRA scaling factor (alpha / rank controls adaptation strength).
lora_dropout
float
0.05
Dropout applied to LoRA layers.
lora_target_modules
array
null
List of module names to apply LoRA to (e.g. ["query", "value"]). Defaults to model-specific modules.