Firmware Architecture Patterns for IoT Products
Event-driven, hierarchical state machines, hardware abstraction layers — the firmware architecture patterns that survive past v1 and don't require a rewrite at v3.
Firmware that needs a rewrite at v3 was almost always badly architected at v1. Not in the toolchain, not in the language — in the structural patterns. Five patterns, used together, give you firmware that grows over years instead of collapsing under its own weight.
1. Hardware abstraction layer (HAL)
A clean HAL separates “this device’s firmware” from “this device’s silicon”. The application talks to abstract interfaces — read_temperature(), start_advertising(), flash_write() — and the HAL implements them per silicon family.
Why it matters:
- Porting between MCU families (ESP32 → STM32, nRF52 → nRF54) becomes a HAL replacement, not an application rewrite
- Unit testing the application becomes possible — you stub the HAL on a host machine
- Firmware bring-up on a new hardware revision (different sensors, different connector pinout) is a HAL swap
What goes in the HAL:
- Peripheral access (I²C, SPI, UART, ADC)
- Sensor drivers (with a uniform interface across sensor models)
- Radio operations (advertise, connect, send, receive)
- Storage (read, write, erase, with wear levelling abstracted)
- Time and timers
What does not go in the HAL:
- Business logic (do not put “what to do when temperature is high” in the temperature driver)
- Protocol-specific framing
- Application state
Most teams discover the HAL boundary too late. Drawing it on day one saves quarters of rework.
2. Event-driven core
The application is a state machine that consumes events. Events come from interrupts (sensor ready, button press), timers (periodic sample, watchdog), the radio (BLE connection, MQTT message), and itself (state transition queued).
The core loop is roughly:
while (true) {
Event ev = wait_for_event();
state_machine_dispatch(ev);
}
Why event-driven beats spaghetti:
- All state transitions go through one dispatcher — easy to log, easy to test, easy to reason about
- Adding a new feature is “add an event type and a transition” — no mystery functions getting called from interrupt handlers
- The system can be deeply asleep between events, which directly maps to battery life (our BLE battery post)
What to use:
- Zephyr’s
k_msgqif on Zephyr/NCS - FreeRTOS queues on FreeRTOS-based RTOSs
- A custom ring buffer with a critical section if running bare-metal
3. Hierarchical state machines
Flat state machines work for simple products. As soon as the product has 10+ states with shared behaviour (idle, advertising, connected, paired, OTA, error), flat state machines become unmaintainable.
The pattern that scales:
Top
├── Booting
├── Operating
│ ├── Idle
│ ├── Advertising
│ ├── Connected
│ │ ├── Authenticated
│ │ └── Unauthenticated
│ └── OTA
└── Error
Each state has entry, exit, and event handlers. Events not handled at a leaf state bubble up to the parent. Parent states implement shared behaviour once (e.g., the watchdog kick logic in Operating).
Implementations:
- QP/C or QP/Nano by Quantum Leaps — mature C implementation of UML statecharts
- Hand-rolled with a state-table — fine for moderate complexity
- State-machine generators (like
cuTk) — when you want to model in a tool and generate code
The investment pays off as soon as the product spans more than 10 states. For very simple products (a switch, a beacon), flat is fine.
4. The driver / service / application three-layer
A pattern that keeps responsibilities clear:
Drivers (in the HAL): talk to silicon. Stateless except for hardware state. No knowledge of the application.
Services: stateful, mid-level functionality. The “BLE connection service,” the “OTA service,” the “battery monitor service.” Each service owns its state, exposes an event-and-API interface to the application, and uses one or more drivers underneath.
Application: the top-level state machine that orchestrates services. It does not talk to drivers directly; it talks to services.
Why three layers, not two: services can be reused across products. Application code is product-specific. Drivers are silicon-specific. Mixing the three creates the spaghetti you wanted to avoid.
5. Logging, asserts, and the post-mortem
Firmware fails in places that are hard to reach (a customer’s home, a remote farm, a factory floor). You need post-mortem capability.
Logging:
- Compile-time log levels per module
- Log events to a ring buffer in RAM (and to UART for development)
- Persist critical errors to flash (in a wear-levelled region) so they survive reboot
- Upload logs on connect to your fleet-management platform
Asserts:
- Liberal use of
__ASSERT(condition, "msg")during development - In production, asserts capture file/line/state and reset the device cleanly — better than silent corruption
- Reset reason recorded in non-volatile storage so the boot path can read why we last died
Coredumps:
- Some MCUs (Nordic nRF53, nRF54, certain STM32) support saving register state on hard fault
- Memfault, Percepio, or hand-rolled solutions can extract these on next connect for analysis
A field device that crashed silently is a field device you cannot debug. Build the post-mortem capability before you need it.
What we hand over
For every IoT firmware engagement we ship a FIRMWARE_ARCHITECTURE.md in the repo with:
- A diagram of the HAL → services → application layering
- The state-machine hierarchy as a tree, with each state’s entry/exit semantics documented
- The event types and their producers/consumers
- Logging and assertion conventions
- The boot path, including reset-reason handling and post-mortem capture
The document is part of the deliverable. It outlives the engineer who first wrote it; it onboards every new engineer who joins; it survives the inevitable rewrite that comes when the product expands into a new market.
If you are looking at a firmware codebase that is one rewrite away from a v3 trainwreck — or starting a new product and wanting to do this right from day one — we have shipped this pattern across many engagements.
Keep reading
-
Connectivity
BLE Mesh vs Thread vs Zigbee: Picking the Right Mesh in 2026
BLE Mesh, Thread, and Zigbee compared for product teams in 2026 — protocol fit, ecosystem support, Matter compatibility, and the trade-offs we weigh on real projects.
Read -
Embedded
IoT Power Budget Modelling: A Spreadsheet That Predicts Battery Life
How to build a power-budget spreadsheet for an IoT product — duty cycles, sleep currents, derating — that predicts battery life within 10% of measured.
Read -
Hardware
IoT Product Bring-Up Checklist: From PCB to First 100 Units
The bring-up checklist we run on every IoT product — power rails, peripherals, radio, certification, factory test. The order that catches issues early, not late.
Read