Roadmap

spec v0.5.0

A phased plan from “two boards on a bench” to “deployable system in DOC hands.” Each phase has a clear exit criterion before the next begins.

Version bands

The phases group into version bands so it’s clear what a given release contains:

Band	Phases	Theme
MVP	1–6	Working single-network deployment; bench then field pilot
v1.0	7–8b	Central service, headless hub, MQTT, Trap.NZ one-way
v1.5	(post)	Multi-tenant, public role, web flasher, Trap.NZ both ways
v2.0+	(post)	OIDC for orgs, autonomous endpoint reorder, OTA K_admin

MVP is “we can run a pilot.” v1.0 is “we can hand this to a community group.” v1.5 is “anyone can buy a node and join a network.” v2.0+ is “DOC-grade enterprise features.”

Phase 0 — Specification (current)

Exit criterion: this docs/ tree is reviewed and agreed.

Architecture, protocol, crypto, power budget, provisioning, hardware all documented.
Open questions tracked in the issues / project board (when we set one up).

Phase 1 — Bring-up

Goal: prove the toolchain end-to-end with a single unencrypted frame.

Progress notes are in project-management/status.md (2026-05-29 entries). Summary of plan vs. reality:

Set up firmware repo structure. Landed as firmware/{validation, field,hub}/; the field firmware is a single binary (field/, role set at provisioning) rather than separate endpoint/router dirs, per the node-roles model in architecture.md. provisioning/ deferred to Phase 4.
Choose firmware framework — closed as ADR-14 (Zephyr field / ESP-IDF hub / Arduino validation).
Bring up SX1262 TX on T114. Done both via the Arduino harness and via Zephyr (firmware/field/), at SF9/125k/+14 dBm. Payload is meshtrap-ping <seq> rather than literally "hello".
Bring up SX1262 RX. Done on the Heltec WiFi LoRa 32 V3.1, not the SenseCAP Indicator — the SenseCAP routes the SX1262 control lines through a PCA9535 IO expander that stock RadioLib can’t drive, so SenseCAP RX moves to Phase 2 ESP-IDF (see firmware/sensecap-indicator-hub.md).
Bench-measure deep-sleep current on T114, baseline. Blocked on a power analyser (PPK2 / Joulescope) not yet on hand.

Exit criterion: T114 → RX, plaintext, one direction, repeatable — met (originally written as “T114 → SenseCAP”; RX board substituted as above). Deep-sleep baseline is the one outstanding Phase 1 item, gated on test equipment.

Phase 2 — Single-cluster minimum viable network

Goal: one endpoint, one router, one hub, encrypted, with the protocol skeleton.

Progress (see project-management/status.md, 2026-05-29 → 05-31 entries):

Frame envelope (clear header + AES-CCM payload) — spec/frames.yaml v0.2.0, implemented on both field (PSA/CryptoCell) and hub (mbedTLS).
[~] STATUS, STATUS_ACK, JOIN, JOIN_ACK, ANNOUNCE, COMMAND — STATUS, STATUS_ACK, and COMMAND/COMMAND_ACK done (encrypted both directions, inner AES-CMAC admin_mic for COMMAND auth, spec v0.4.0). JOIN/JOIN_ACK/ANNOUNCE remain.
Hardcoded keys (K_group + K_field) and node IDs (no provisioning wizard yet).
Replay protection + reboot-safe state — both at the frame layer (per-source seq window) and the command layer (cmd_seq); field and hub persist seq / cmd_seq / config to NVS. Endpoint ACK/retry → help-mode state machine also done. Not in the original bullet list, but landed here.
Endpoint sleeps between transmissions, wakes on reed switch, sends STATUS with trigger bit. (Currently a timed 5 s loop; no reed switch or sleep yet.)
Router receives, dedups, forwards to hub. (No router tier yet — field talks directly to the hub for now.)
Hub logs to SQLite on SD card.
Hub serves a minimal web page listing nodes and last-seen.
MQTT client built into hub firmware, default disabled, TX-only or TX/RX.

Recommended next (as of 2026-05-31): finish the protocol layer first — land JOIN / JOIN_ACK / ANNOUNCE (the discovery/provisioning handshake) on top of the working STATUS/STATUS_ACK/COMMAND core — then move to the hub application and endpoint sensing work (web UI, SQLite, reed switch, deep sleep) that the exit criterion below actually measures. Rationale: the encrypted message core is the load-bearing dependency for the router tier (Phase 3) and provisioning (Phase 4), so completing it keeps those unblocked; the hub-app/sensing track can then be done as one coherent push against the exit criterion rather than interleaved.

Exit criterion: bench demo of “trip the reed switch, alert appears in the hub web UI within 30 seconds.” Bonus: MQTT publish observable on a local mosquitto for inspection. Not yet met — the encrypted protocol core (envelope, STATUS/STATUS_ACK) is in place; the web UI, storage, reed switch, and router tier are the remaining Phase 2 work.

Phase 3 — Multi-cluster and router-tier routing

Goal: prove the router-to-router backbone with at least two routers.

Implement ROUTING_BEACON and the distance-vector parent selection.
Implement ROUTER_UPLINK forwarding with TTL.
Three-router test: hub ← R1 ← R2 ← R3, with one endpoint per router.
Pull power on R2 to test failover (R3 should re-parent through R1 if in range, otherwise stay parented and queue).

Exit criterion: 2-hop endpoint → router → router → hub round-trips reliably, and the network recovers from one router going dark.

Phase 4 — Provisioning and ops

Goal: a non-technical person can deploy a node.

BLE provisioning characteristic and the JSON / CBOR schema for config writes.
Deployment wizard (start as a web app served from the hub over BLE, runnable on a phone browser).
Factory reset and BLE auto-wake on reset.
Hub-issued wake_ble COMMAND flow end-to-end.
Hub web UI: dashboard, inventory map, history, command queue, audit log.
Group QR code workflow.

Exit criterion: deploy a fresh node end-to-end using only the wizard and the hub web UI.

Phase 5 — Crypto hardening

Goal: key management is solid enough for real deployments.

K_admin separation for privileged commands.
KEY_ROLLOVER end-to-end (hub issues rollover, routers and endpoints accept both keys during overlap, eventually settle on new key).
Application-layer ACL enforcement on routers and endpoints.
Secure key storage on nRF52840 (APPROTECT) and ESP32-S3 (eFuse + NVS encryption).

Exit criterion: rotate K_group across the whole bench cluster without losing a single check-in.

Phase 6 — Field pilot and Trap.NZ integration

Goal: real deployment, small scale, somewhere accessible, reporting into the existing community ecosystem.

Weather-resistant enclosures (3D-printed acceptable for pilot).
3–5 trap nodes, 1 router, 1 hub at a co-operative trapping group’s site.
Trap.NZ integration — the hub pushes trap-state changes into Trap.NZ via its open API. This is treated as a hard requirement, not optional; any deployment with a NZ community group will depend on it.
Two-week observation: power, range, missed check-ins, false alerts.
Iteration based on what breaks.

Exit criterion: 2 weeks of operation with no firmware patches needed, ≥ 95% of expected check-ins received, and trigger events visible in Trap.NZ within 5 minutes.

Phase 7 — Headless hub and outdoor variants

Goal: deployments that don’t need a ranger station.

Bare ESP32-S3 + SX1262 hub in an IP66 enclosure with its own panel/battery.
MQTT outbound over LAN (existing pattern, just packaged differently).
Power budget validation for the outdoor hub.

Exit criterion: a deployment with no SenseCAP, no mains power, no ranger station — just routers, traps, and the headless hub talking outbound.

Phase 8 — Cellular hub (future, scoped)

Goal: fully off-grid deployments with remote-server reporting.

Cellular modem add-on (LTE-M / NB-IoT for NZ Spark/2degrees coverage where feasible; fallback to 4G).
MQTT over cellular to a central server (small, ~KB/day data plan suffices).
SIM management and roaming policy.
Provisioning flow that includes APN and SIM PIN.

Architectural prep already done: the hub’s transport layer is decoupled from its application layer, so the cellular hub is a transport swap, not a re-architecture. MQTT client has been in hub firmware since Phase 2.

Phase 8b — Central service (v1.0 — first central-service release)

Goal: an open-source, containerised central service that hubs can optionally report into. Single-org / single-network support to begin with; multi-tenant features defer to v1.5.

Containerised deployment (Docker Compose for self-hosters, K8s for the public default instance): MQTT broker (Mosquitto), Postgres, API server, web UI, reverse proxy.
Two operating modes:
- Status monitoring (TX-only from hubs): central service consumes telemetry, presents dashboards. No commands flow downward.
- Command-and-control (TX/RX): central service can issue COMMANDs that hubs relay to nodes, subject to the same K_admin signing rules.
The central service’s web UI is built to match the hub’s web UI so operators see one consistent experience.
Default off at the hub. A deployment that opts in chooses TX-only or TX/RX at hub configuration time.
Trap.NZ integration: one-way push (our central service → Trap.NZ) at this phase. Bidirectional pull (servicing events back from Trap.NZ) in v1.5.
Does not include yet: web flasher, multi-network per org, public role, OIDC. Those land in v1.5.

This phase intentionally lives next to Phase 8 (Cellular hub) because they solve adjacent problems.

Phase 9 — Multi-tenant central service (v1.5)

Goal: open the central service to public registration.

Multi-tenant: orgs, multiple networks per org, regions, RBAC.
Public role (operator-controlled per-network).
Web flasher (browser-based, WebSerial / Web Bluetooth) — gated on the firmware having shipped a DFU bootloader since Phase 1.
Trap.NZ bidirectional integration: pull servicing events back to our records.
Operator-approved enrolment flow for public buyers: register hardware, approve, receive onboarding token, BLE-provision into a network.
SNR/RSSI-based router-list reorder suggestions (“rules-based AI” — start simple, only reach for real ML if rules don’t cut it).

Exit criterion: a public buyer can purchase a fresh node, register it with an operator, BLE-provision via the web flasher, and have it talking on the network within 10 minutes.

Phase 10 — Hardening and DOC adoption (v2.0+)

Goal: enterprise-grade features for the largest operators.

OIDC support for orgs (external auth / authorisation).
Autonomous endpoint reordering with self-recovery.
Optional ML-assisted optimisation suggestions.
OTA K_admin rotation (Option B or C per D-02 outcome).
Audit and compliance: detailed access logs, retention controls.

Phase 11 — Open-source release and DOC adoption

Goal: hand it off.

Open the repo under a permissive licence (likely Apache-2.0 or MIT).
Reproducible build instructions for all firmware variants.
Bill of materials and assembly guide for at least one PCB revision.
Operator manual.
Engagement with DOC, Predator Free 2050, and community trapping groups.

Firmware-side prerequisites for the central service

Some central-service features require firmware-side investment in earlier phases so they’re ready when the central service catches up:

DFU-capable USB bootloader on every node from Phase 1. Required so the v1.5 web flasher can write firmware over USB without an external programmer.
BLE provisioning GATT service with a documented characteristic for config writes, from Phase 4. Required for the web flasher’s post-flash provisioning leg.
MQTT client in hub firmware with TX-only and TX/RX modes, from Phase 2. Required for any central-service relay.

If any of these slip, the corresponding central-service phase slips with them. Worth flagging during firmware code review.

Cross-cutting concerns

These aren’t phases, but they’re constraints to remember at every step:

Battery life is a feature. Any change that increases endpoint sleep current by >1 µA needs justification.
Airtime is shared. Test bandwidth usage as the deployment scales (e.g. what if a stoat trips 5 traps in 30 seconds and they all retransmit 3×?).
Field staff are the user. If the wizard or hub UI takes more than 60 seconds to deploy a node, we’ve failed the brief.
Offline by default. Nothing in the system should depend on internet connectivity to function. Internet is a delivery mechanism for reports, not a requirement.
Honest propagation claims. This project is not differentiated on raw radio range in dense NZ bush — proprietary narrowband VHF (e.g. Celium) outperforms 868 MHz LoRa in those conditions. Our advantage is mesh topology filling LoRaWAN gateway gaps in gully terrain, plus open hardware and firmware. Marketing and docs should reflect this.

Stretch ideas (not committed)

Audio classification on the endpoint (e.g. on-device ML for possum calls). Almost certainly not power-feasible on nRF52840 — out of scope unless we change silicon.
Camera trap integration via a separate gateway. Out of scope but worth noting that adjacent conservation tech could share the LoRa backbone.
Lure-monitoring (weight sensor on bait stations). Same hardware, different application of the STATUS frame’s flag bits.