AI on the Edge vs AI in the Cloud: Where to Run the Model
A practical decision framework for whether your IoT product's AI feature should run on the device or in the cloud, and the hybrid pattern that often wins.
The question “edge or cloud?” sounds like an architecture choice. In practice it is a constraint problem — and the answer is usually some specific blend of both, decided by latency, connectivity, privacy, and cost.
The four constraints that decide
For any IoT product with an AI feature, four pressures determine where the model runs:
- Latency. How quickly must the system respond to an input? Sub-100 ms responses to user input or safety events live on the device. Multi-second responses fit either side.
- Connectivity. Is the device always online, sometimes online, or rarely online? An always-on device can lean on the cloud. A field sensor cannot.
- Privacy / regulation. Does the data need to stay on the device, in a region, or can it travel freely? Healthcare, in-cabin audio, and certain industrial settings strongly bias toward edge.
- Cost at scale. How much does cloud inference cost per inference, and how often does it run? At fleet scale, even cents per inference add up to real money.
A product with tight latency, sometimes-offline operation, sensitive data, and high inference frequency is an edge AI candidate. A product with looser latency, always-on operation, low-sensitivity data, and infrequent inference is a cloud candidate. Most real products land somewhere in between.
Edge wins for these patterns
- Reflex actions. Wake-word detection, fall detection, gesture recognition — actions that need to fire in milliseconds with no round-trip.
- Privacy-critical inference. Audio that never leaves the home, video kept on-device, biometric data processed locally.
- Bandwidth-bound deployments. Industrial vibration sensing where raw data cannot be shipped to the cloud at full rate.
- Offline-tolerant features. A camera that detects motion when the internet is out is a camera that earns its place.
Cloud wins for these patterns
- Heavy models. Anything beyond what fits in 1-2 MB of quantized weights — image generation, natural-language understanding at scale, large transformer models.
- Frequent retraining. Models that learn from aggregate fleet data and update centrally are simpler to operate in the cloud.
- Cross-device reasoning. “Tell me when any device in this customer’s fleet shows anomaly X” requires cross-device context that single devices cannot provide.
- Low-frequency, high-value inference. A daily summary report does not justify the engineering cost of running on-device.
The hybrid pattern that often wins
Most production IoT AI systems are not purely edge or cloud. They split work along a clean line:
- The device runs a fast, narrow model for low-latency triage. Detect that something interesting is happening; do not classify it precisely.
- The cloud runs a heavier model on the data the device flagged as interesting. Full classification, context across the fleet, integration with business logic.
- The cloud’s results flow back to the device — updated thresholds, new model versions, configuration changes — over OTA.
This pattern works because it respects the constraints: the latency-critical path runs locally; the compute-heavy path runs centrally; the data movement is bounded by what was actually interesting.
A concrete example: a security camera that runs a small motion-and-person detector on-device. Anything that triggers it is uploaded to the cloud, where a richer model classifies the scene and decides whether to alert. The bandwidth used is a fraction of always-streaming, the latency on real events is human-fast, and the cloud cost scales with events, not with hours of footage.
What you give up by going pure edge
- Easy iteration. Cloud models update with a deploy. Edge models update with an OTA pipeline that needs to be reliable enough to ship to thousands of devices.
- Aggregation. Cross-device patterns are harder to detect when each device sees only itself.
- Complex reasoning. Multi-step reasoning, retrieval-augmented generation, and large-context tasks do not fit on the device today.
What you give up by going pure cloud
- Privacy story. Even with good encryption in transit, “your data goes to our servers” is a different conversation than “your data stays in your home.”
- Offline operation. Anything that depends on the cloud fails when the internet does, often at exactly the moment users most want the product to work.
- Latency floor. A round trip to the cloud is, in the best case, 50-100 ms. For reflex actions, that is too slow.
- Cost at scale. Cloud inference at high volume becomes the dominant line item.
The decision in practice
Over and over, the same flow:
- List the AI features the product needs.
- For each, identify which constraint is binding — latency, connectivity, privacy, or cost.
- Place the inference where the binding constraint is satisfied.
- Plan the data flow and update path between edge and cloud explicitly.
The teams that fail are usually those that either default everything to the cloud (and discover privacy or latency problems in user testing) or default everything to the edge (and burn engineering cycles trying to fit large models on small chips).
If you are designing the AI architecture for an IoT product and want a second opinion on the split, we have run this exercise on more than a few products.
Keep reading
-
Connectivity
Choosing IoT Connectivity: Wi-Fi, BLE, LoRaWAN, NB-IoT, or Cellular
A practical decision guide for picking the right wireless stack for your connected product, based on power, range, throughput, cost per device, and operational reality.
Read -
Edge AI
Edge AI on Microcontrollers: TinyML in 2026
What works, what is still painful, and how to decide whether your IoT product should run a model on the device or in the cloud.
Read -
Security
Securing IoT: Threat Models, Secure Boot, and TLS in Constrained Devices
A practical security baseline for connected products — what to do, in what order, and what can wait until v2.
Read