Architecture¶

Phalanx is a standard Flower app-model application (flwr 1.31 Message API) plus an OpenTelemetry layer. Four modules, no framework of our own:

Module	Role
`phalanx/task.py`	Model (HF transformer + PEFT/LoRA), data (`flwr-datasets`, IID or Dirichlet non-IID), `train_fn`/`test_fn`, and adapter-state helpers.
`phalanx/client_app.py`	`ClientApp` with `@app.train` / `@app.evaluate`. Loads the broadcast adapters into a frozen-backbone LoRA model, trains/evaluates on its partition, replies with adapters only. Wraps each pass in a client span.
`phalanx/server_app.py`	`ServerApp` with `@app.main`, plus `ObservableFedAvg`. Builds the initial adapter state, runs `strategy.start(...)`, and emits per-round telemetry.
`phalanx/telemetry.py`	OpenTelemetry tracer/meter providers, round/client span context managers, and FL metric instruments. Exporters are pluggable: OTLP, console, or in-memory (for tests).

One federated round¶

strategy.start() drives this loop for num-server-rounds:

ServerApp.main
  └─ ObservableFedAvg.start(initial_arrays = adapter state)
       for each round:
         configure_train   → broadcast adapters to sampled clients
         ClientApp.train    → set adapters, train locally, return adapter delta   [fl.client.train span]
         aggregate_train    → FedAvg over the returned adapters
         configure_evaluate → broadcast updated adapters
         ClientApp.evaluate → evaluate locally, return loss/accuracy              [fl.client.evaluate span]
         aggregate_evaluate → FedAvg over metrics  →  observe_round(...)          [fl.round span + metrics]

Adapter-only federation¶

The model is a HuggingFace sequence-classification transformer wrapped with a PEFT LoraConfig. Only the LoRA adapters and the newly-initialised classification head (PEFT modules_to_save) are trainable, and only those tensors are federated (get_adapter_state / set_adapter_state). The frozen backbone never leaves a client, so each ArrayRecord on the wire is small (tens of KB, not the full model).

ObservableFedAvg subclasses Flower's FedAvg and overrides aggregate_train (to count participating clients) and aggregate_evaluate (to read the aggregated loss/accuracy and call observe_round). FedAvg's key-matched aggregation works because get_adapter_state returns a stable set of keys across the server and all clients.

The OpenTelemetry layer¶

telemetry.py keeps its tracer/meter providers module-local (off the OTel globals) so tests can re-initialise between cases. init_telemetry chooses an exporter:

an injected in-memory exporter (unit tests),
a console exporter when OTEL_TRACES_EXPORTER=console,
an OTLP exporter when OTEL_EXPORTER_OTLP_ENDPOINT is set,
otherwise telemetry is recorded but not exported.

Server-side, each round emits an fl.round span (fl.round, fl.loss, fl.accuracy, fl.clients) and the metrics fl.round.loss / fl.round.accuracy / fl.round.clients. Client-side, each pass emits an fl.client.train or fl.client.evaluate span and fl.client.examples / fl.client.loss metrics.

Because the simulation runs clients in separate Ray processes, client spans are independent traces in v1. Linking them as children of the server's round span via Message.metadata trace-context propagation is the v2 direction (see ROADMAP.md).