Edge AI on Microcontrollers: TinyML in 2026
What works, what is still painful, and how to decide whether your IoT product should run a model on the device or in the cloud.
A few years ago, “AI on a microcontroller” meant a 20-line keyword spotter and a great deal of hand-waving. In 2026 it is a serious option for a meaningful slice of IoT problems — but the failure modes are still subtle, and the right call is more often “no” than the marketing material suggests.
When edge AI is the right call
Run the model on the device when:
- Latency must be measured in milliseconds, not seconds. A fall detector cannot wait for a round-trip to the cloud.
- Connectivity is unreliable or absent. A vibration sensor on a remote pump cannot ship 100 Hz raw data over LoRa.
- Privacy or regulation requires data never leaves the device. Healthcare wearables, in-cabin audio, certain industrial settings.
- Cloud cost would dominate the bill of materials. Streaming raw sensor data from a million devices to inference services adds up faster than the model would.
If none of these apply, run the model in the cloud. The edge constraints make development slower and cheaper hardware faster.
What you can realistically run on a microcontroller
Approximate ranges for an ESP32-S3 or an STM32H7-class chip with a quantized model:
- Wake-word and keyword spotting: routine. Sub-100 KB models, sub-100 ms latency.
- Activity recognition from accelerometer data: routine.
- Anomaly detection on time-series sensor data with a small autoencoder or one-class SVM-equivalent: yes.
- Person detection from very low-resolution images (96×96 grayscale): yes, with care.
- General object detection at usable resolution: not on a stock MCU. Move to an MCU with a neural accelerator (NXP i.MX RT crossover, STM32N6 with Neural-ART) or a Linux-class SBC.
- LLM inference: no. Even a small quantized 1B-parameter model exceeds typical MCU memory by orders of magnitude.
The model is the easy part
The hardest engineering on a TinyML project is rarely model accuracy. It is:
- Sensor pipeline reliability. The model’s accuracy on your validation set means nothing if the production sensor signal looks different from what you trained on. Calibration drift, sample-rate jitter, and environmental coupling all show up.
- Quantization stability. Float-trained models that quantize to int8 cleanly are not the default. Expect to tune quantization-aware training and have a calibration dataset that genuinely represents production conditions.
- Memory budget. Inference uses the activation buffers you account for, plus the working memory you forgot. Profile both, on the actual chip, with the actual model.
- Battery accounting. A model that takes 80 ms per inference at 100 mA average is fine. The same model running 30 times a second silently halves your runtime.
What good edge-AI deployment looks like
The systems that work in the field share a few patterns.
- The model is a service, not a snowflake. It has a versioned binary, a known interface, and a process for shipping updates over OTA. New models do not require new firmware releases.
- Inference output is logged. A small ring buffer of recent predictions and the input features that produced them. When something looks weird in the field, you have evidence to debug with.
- Drift detection runs alongside the model. A simple statistical check that the input distribution today resembles the training distribution. When it does not, the system flags it before the model silently degrades.
- There is a fallback. If the model crashes or returns nonsense, the device defaults to a sensible deterministic behavior. The model is an enhancement, not a single point of failure.
The boring path that wins
For a first edge-AI product:
- Define the metric — accuracy, latency, false-positive cost — before picking a model architecture.
- Collect a real dataset on the actual sensor in the actual environment. Months, not days.
- Pick the smallest model that hits the metric. A logistic regression on hand-engineered features beats a neural network you cannot debug.
- Ship it with full telemetry, drift monitoring, and OTA. Improve it post-launch with real data.
Skipping step 2 is the most common reason edge-AI projects ship and then quietly get turned off.
If you have a TinyML project that has been “almost ready” for six months, we have probably seen the failure mode.
Keep reading
-
Embedded
ESP32 vs STM32: When to Pick Each for Your IoT Product
A side-by-side look at when ESP32 wins, when STM32 wins, and the small set of cases where neither is the right answer.
Read -
Embedded
Designing OTA Firmware Updates That Don't Brick Devices
The patterns we use to ship firmware over the air to devices in the field — A/B partitions, rollback, signed images, staged rollouts, and the failure modes that bite if you skip them.
Read -
Industrial
Industrial IoT: Predictive Maintenance with Vibration Sensors
How to design a predictive maintenance program that actually catches failures before they happen — sensors, edge processing, baselines, and the operational practices that make it stick.
Read