Crypto

spec v0.5.0

Goals

Confidentiality: trap status and command traffic is not readable to a passive observer.
Integrity: a frame cannot be modified in flight without detection.
Authenticity: a frame’s claimed source is genuine to the level of “possesses the group key.”
Replay protection: a captured frame cannot be replayed later.
Bounded blast radius: compromise of a single node does not silently compromise the rest of the network forever.

Primitives

AES-128-CCM with a 7-byte nonce, 4-byte MIC, and the frame header as AAD.
Hardware-accelerated on both nRF52840 (CryptoCell) and ESP32-S3 (AES peripheral).
A single primitive across all roles. No custom crypto.

Key hierarchy

`K_group` (network key)

128-bit symmetric key.
Shared by every node in a deployment (or “group” in cross-deployment scenarios).
Used to encrypt and authenticate all normal traffic.
Provisioned via the deployment wizard over BLE (or USB during initial flash).
Rotated periodically (see Key rollover).

`K_admin` (privileged-command key)

128-bit symmetric key.
Held only by HUB and TECH nodes.
Used to sign (HMAC-style, or by re-encrypting under K_admin) privileged COMMAND frames that endpoints / routers will accept.
An endpoint only accepts COMMANDs whose K_admin signature is valid.
A compromised endpoint cannot mint COMMANDs because it does not have K_admin.

Per-deployment scoping

Each deployment has its own {K_group, K_admin} pair. Cross-country deployments are simply different groups with different keys. The hub holds the keys for whichever groups it manages.

The LoRa sync word is also derived per-deployment (e.g. from a hash of K_group) so that traffic from another deployment on the same band is filtered in hardware before reaching the MAC layer, saving power.

Frame protection

For every frame:

ciphertext, MIC = AES-CCM-Encrypt(
    key   = K_group,
    nonce = src(4) || seq(2) || dir(1),
    aad   = header_clear,
    plaintext = type_specific_payload
)

dir = 0 for uplink (toward hub), dir = 1 for downlink.
Nonce uniqueness is guaranteed by the monotonic seq per source.
The receiver rejects any frame where seq ≤ last_seen_seq[src] (replay).

Privileged commands (K_admin)

A COMMAND frame from the hub or a TECH node carries an inner K_admin-signed blob:

plaintext_payload = COMMAND_OUTER {
    cmd_type,
    cmd_payload,
    admin_mic = AES-CMAC(K_admin, src||dst||cmd_type||cmd_payload||seq)
}

The whole thing is then encrypted under K_group as a normal frame.

An endpoint receiving a COMMAND verifies:

The outer K_group MIC (normal frame check).
The inner admin_mic using its copy of K_admin.

Only if both pass is the command applied.

Commands that require K_admin

wake_ble(duration) — turn on BLE for N minutes
set_primary_router, set_secondary_router
set_check_in_interval
factory_reset_confirm (a plain reset doesn’t need K_admin, but a remote one does)
key_rollover — see below

Commands that do NOT require K_admin

STATUS_ACK with no payload
WHO_ARE_YOU
Routine ping / heartbeat

Application-layer ACL

Possessing K_group is not sufficient to command an endpoint. Endpoints additionally enforce:

Downlink source must be primary or secondary router for any non-COMMAND traffic.
COMMAND frames must additionally pass K_admin verification.
An endpoint will silently drop traffic from a router not in its {primary, secondary} set.

Routers enforce:

Uplinks from non-child endpoints are dropped. Each router maintains a list of child endpoint IDs, set at provisioning.
The exception is HELP frames, which any router will forward (so an orphan endpoint can be adopted).

Key rollover

Key rollover bounds the lifetime of a compromised K_group and provides operational hygiene.

Triggering

Key rollover is always hub-initiated and can occur at any time. There is no firmware-enforced minimum or maximum interval; the mechanism is deliberately decoupled from any specific schedule so that organisational policy (annual rotation, post-incident rotation, ad-hoc rotation on staff turnover) drives cadence without firmware changes.

Defaults documented elsewhere (e.g. annual) are policy suggestions, not technical constraints.

Common triggers

Calendar policy (e.g. annual rotation).
Known or suspected compromise of K_group (a node lost in the field for an extended period, theft of a hub, etc.).
Staff turnover where an operator with K_admin access has left.
Major firmware release boundary.

Procedure

Hub generates K_group_new.
Hub sends a rotate_key COMMAND (cmd_type 0x06, signed with K_admin) to all routers, carrying K_group_new and an activation epoch T_activate.
Routers begin accepting frames under either K_group or K_group_new, while continuing to TX under K_group until T_activate.
For each endpoint, the hub sets config_pending on the next STATUS_ACK (and rekey_pending advisory bit). The endpoint extends its RX window and receives the rotate_key COMMAND.
The endpoint stores K_group_new alongside K_group. It tries K_group_new first and falls back to K_group on decryption failure. It records last_key_rotation_at on successful application.
After T_activate, routers TX under K_group_new only. Old K_group remains accepted indefinitely (no hard expiry) so orphaned-by-rollover endpoints are not lost — they are simply re-keyed at next maintenance visit.

The hub tracks per-endpoint last_key_rotation_at (returned in every ANNOUNCE) and exposes a “key age” view in the web UI so operators can detect nodes that missed a rollover window.

New nodes during rollover

The provisioning wizard always programs the current K_group at the time of provisioning.
During the rollover overlap window, the wizard programs K_group_new and the node receives no surprises.

Orphan recovery

If an endpoint is somehow desynced (held wrong key, old hardware reflashed later), a TECH node can re-key it in person via BLE provisioning.

K_admin rotation

K_admin rotation is more disruptive than K_group rotation: every endpoint and router needs the new K_admin to validate future privileged commands.

v1 — Option A: physical re-provisioning only (accepted)

v1 handles K_admin rotation by physical re-provisioning at maintenance time. Trade-off accepted: K_admin compromise is rare (requires hub theft), and the K_admin / K_field split (see below) limits the blast radius of the more likely scenario (tech-node loss).

Accepted risk: a lost hub exposes K_admin, requiring a field visit to every node in the affected network to re-provision. For a 200-trap deployment, this is roughly a week of field work. Tracked in the risk register as S-08.

v1 — K_admin / K_field split (accepted)

TECH nodes no longer carry K_admin. Instead, two keys:

K_admin (hub-only): rotate K_group, factory_reset_remote, router-list manipulation, rotate K_admin itself, KEY_ROLLOVER.
K_field (hub + TECH nodes): wake_ble, request_announce, set_check_in_interval, orphan adoption, servicing log.

Endpoints verify the appropriate key based on the COMMAND’s cmd_type. The COMMAND frame format does not change — only which key signs which command.

Lost TECH node → rotate K_field (hub pushes to all nodes on next config-pull cycle; no field visit). Lost hub → rotate K_admin (Option A: field visit to every node).

Why OTA K_admin rotation is hard

The bootstrap problem: if any command signed with K_admin can rotate K_admin, then an attacker who already holds K_admin can immediately rotate it to a key only they know — permanently locking the legitimate operator out. A naive OTA rotation makes a bad day worse.

Upgrade path: Option B then Option C

The upgrade path from A → B → C is additive, not a rewrite. The COMMAND verification pipeline is designed as a pluggable step: “receive COMMAND → verify auth → apply if valid → ACK.” Swapping the verification step is a localised firmware change; the rest of the pipeline (frame parsing, command dispatch, config persistence, ACK) is untouched.

Option B — “Break-glass” pre-shared `K_admin_next` (planned for v1.1+)

At provisioning, every node receives both K_admin and a sealed, unused K_admin_next. OTA rotation must be signed by K_admin_next, which an attacker holding only the stolen K_admin does not possess (unless they also extracted a deployed node’s flash).

After one OTA rotation, the K_admin_next slot is spent — the node reverts to physical-only until a field visit installs a fresh K_admin_next.

What changes from Option A:

Provisioning writes one extra key to flash (~50 lines of firmware).
Rotation verification checks K_admin_next instead of K_admin.
No wire format change. No frame envelope change.

Risks (speculative, pending implementation):

Extra key on every node increases flash exposure surface.
“Spent” state must be tracked and visible to the operator; stale K_admin_next values are a silent vulnerability.
If an attacker extracts both K_admin and K_admin_next from a single node’s flash, Option B provides no protection for that network. Mitigation: APPROTECT + field-audit of node tamper state.
One-shot only; after use, back to Option A until a field visit replenishes the slot.

Option C — PKI / hub public-key signing (planned for v2.0+)

Hub holds a long-term ECDSA private key. Every node has the hub’s public key. K_admin rotation commands are signed with the hub’s private key. Arbitrary rotations possible.

What changes from Option B:

Provisioning writes a hub public key to flash (instead of or alongside K_admin_next).
Rotation verification switches from AES-CMAC to ECDSA-verify. The nRF52840’s CryptoCell has hardware ECDSA support.
The COMMAND frame’s admin_mic field becomes an ECDSA signature (64 bytes vs 8 bytes). This is the only wire-format change, and it’s in a variable-length field inside the AES-CCM envelope, so the frame envelope does not change.
Hub-side: generate and protect a private key. Backup and recovery story needed for this new critical asset.