Operations May 9, 2026 · 7 min read

OpenTelemetry for IoT: Instrumenting Constrained Devices in 2026

How to use OpenTelemetry on IoT devices in 2026 — instrumenting constrained MCUs, propagating trace IDs across the device-cloud boundary, and the patterns that work.

#OpenTelemetry #OTel #IoT observability #Distributed tracing #Embedded telemetry #Constrained devices

OpenTelemetry has won the cloud observability standard. In 2026 it’s increasingly the standard for IoT too — but instrumenting a microcontroller is not the same as instrumenting a Kubernetes service. Here is what works, what doesn’t, and how to bridge the gap.

Why OpenTelemetry for IoT matters

Three reasons:

End-to-end traces — when a user clicks a button in the mobile app, the request flows app → cloud → broker → device → back. OpenTelemetry trace IDs propagated through this chain let you debug the whole path in one view.
Vendor neutrality — write instrumentation once, send to Datadog, Honeycomb, Grafana Tempo, Jaeger, or any OTLP-compatible backend. Switching backends without re-instrumenting devices is a real benefit.
Standardisation — the same observability mental model and tooling for cloud and IoT teams. Reduces cognitive overhead, improves cross-team collaboration.

For broader fleet observability principles see our fleet observability post.

What OpenTelemetry covers

Three signals:

Traces — distributed requests across service boundaries with spans, parent-child relationships, attributes
Metrics — numeric measurements over time (counters, gauges, histograms)
Logs — structured events with levels, attributes, optional trace correlation

For IoT, all three are useful but the constraints differ. Traces are most valuable for command-control flows; metrics for fleet-wide health; logs for incident response.

The instrumentation reality on MCUs

Full OpenTelemetry SDKs are not designed for microcontrollers. The reference implementations assume megabytes of RAM and unrestricted threads. On an ESP32 or Nordic nRF52, that’s a non-starter.

Two practical patterns:

Pattern A — Lightweight in-protocol propagation

The device itself doesn’t run a full OTel SDK. Instead, it propagates trace context via MQTT user properties (in MQTT 5) or message headers, and emits metrics/logs in OTLP-compatible format from the cloud-side after receiving messages.

What the device does:

Receive incoming traces in MQTT user properties (e.g., traceparent, tracestate) on commands
Generate child trace context for any work it performs
Embed trace context in outgoing telemetry messages
Emit metric and log fields in OTLP-compatible JSON or Protobuf

What the cloud does:

Receive the messages
Convert to full OTel traces, metrics, logs in a stream processor
Forward to the OTel collector and onwards to the chosen backend

This pattern is the right starting point for constrained MCUs. Most of the OpenTelemetry value comes from cloud-side correlation; the device just needs to participate in the trace.

For MQTT 5 user-property patterns see our MQTT 5 post.

Pattern B — Full OTel SDK on Linux gateways

For Linux-class edge devices (Raspberry Pi, BeagleBone, industrial gateways), the standard OpenTelemetry SDKs work. The C++ and Rust SDKs are mature; the Go SDK is excellent.

A typical gateway runs:

An OpenTelemetry Collector binary configured for batching and resource limits
Instrumented gateway services emitting OTLP traces, metrics, logs
The collector forwards to cloud over HTTPS or gRPC

This is the same pattern used for cloud workloads. For edge gateways, this is the default in 2026.

What to instrument

For an IoT product, the high-value spans are:

Command flow — phone-app sends command → cloud receives → broker forwards → device receives → device acts → device confirms. Each hop is a span; the whole chain is one trace.
OTA flow — release published → device receives notification → downloads → verifies → applies → confirms. Trace per device per OTA.
Provisioning / onboarding — device first contacts cloud → identity verification → registration → first telemetry. Full trace catches failure points.
Diagnostic actions — admin requests log dump from device → device responds. Trace ties admin action to device behaviour.

Continuous high-frequency telemetry (e.g., a sensor reading per second) is not worth tracing — too noisy, too expensive. Aggregate it as metrics instead.

Useful metrics for IoT fleets

Standard OpenTelemetry metrics for an IoT fleet:

device_connections_total — counter of total connections, with attributes for device class, region
device_messages_total — counter of messages by topic, version, type
device_battery_percentage — gauge per device (sampled, not per-message)
device_signal_strength — gauge of RSSI per device
device_uptime_seconds — counter of seconds since boot
ota_status — counter of OTA outcomes by status (downloading, applied, rolled-back, failed)

These are aggregatable across the fleet for dashboards while preserving per-device drill-down via attributes.

The collector — where IoT-specific transformation happens

The OpenTelemetry Collector is where you bridge IoT-native message formats into OTLP standard. Useful processors:

Attribute processor — extract device ID, firmware version, region from message metadata into trace/metric attributes
Filter processor — drop noisy spans (heartbeats, keep-alives) before they hit the backend
Batch processor — efficient batching to reduce backend ingestion cost
Resource detector — populate resource attributes from environment

A typical IoT collector deployment runs in the cloud, ingests from MQTT or Kafka, transforms, and forwards to the backend. For Linux gateways, a sidecar collector also runs.

Backends that work well for IoT

Grafana stack (Tempo for traces, Mimir for metrics, Loki for logs) — open-source, self-hostable, OpenTelemetry-native
Datadog — strong UI, good IoT-aware features, expensive at scale
Honeycomb — best-in-class for trace exploration, good fit for engineering teams
New Relic — broad enterprise features, full APM stack
OpenObserve — emerging open-source full-stack observability platform

For self-hosted deployments, the Grafana stack is the default. For cloud-aligned customers, Datadog or Honeycomb depending on team preference.

What kills OpenTelemetry-on-IoT projects

Three failure modes:

1. Over-instrumentation. Every function gets a span. The trace volume is unaffordable. Fix: instrument boundaries (network, IPC, storage, hardware), not internals.

2. No sampling strategy. Every command becomes a trace; the fleet generates millions per day. Fix: head-based sampling at the broker, or tail-based sampling at the collector, with explicit retention of error traces.

3. Mixing tracing and noisy telemetry. The device emits a sensor reading every second as a trace. Fix: sensor readings are metrics; commands and lifecycle events are traces.

What we typically build

For an IoT observability deployment with OpenTelemetry:

Device-side instrumentation for trace propagation in MQTT user properties (Pattern A) or full SDK on Linux gateways (Pattern B)
Stream processor or collector that translates IoT messages into OTLP
Sampling strategy with rules per trace type
Backend integration with the customer’s choice of backend
Dashboards for fleet health, OTA campaigns, command-flow latency
Operational runbook for using traces during incident response

If you are instrumenting an IoT fleet for observability, we have shipped OpenTelemetry-on-IoT across multiple deployments.

By Diglogic Engineering · May 9, 2026