Skip to main content
Part of: IoT Operations & SRE
Operations · 7 min read

OpenTelemetry for IoT: Instrumenting Constrained Devices in 2026

How to use OpenTelemetry on IoT devices in 2026 — instrumenting constrained MCUs, propagating trace IDs across the device-cloud boundary, and the patterns that work.

OpenTelemetry has won the cloud observability standard. In 2026 it’s increasingly the standard for IoT too — but instrumenting a microcontroller is not the same as instrumenting a Kubernetes service. Here is what works, what doesn’t, and how to bridge the gap.

Why OpenTelemetry for IoT matters

Three reasons:

  1. End-to-end traces — when a user clicks a button in the mobile app, the request flows app → cloud → broker → device → back. OpenTelemetry trace IDs propagated through this chain let you debug the whole path in one view.

  2. Vendor neutrality — write instrumentation once, send to Datadog, Honeycomb, Grafana Tempo, Jaeger, or any OTLP-compatible backend. Switching backends without re-instrumenting devices is a real benefit.

  3. Standardisation — the same observability mental model and tooling for cloud and IoT teams. Reduces cognitive overhead, improves cross-team collaboration.

For broader fleet observability principles see our fleet observability post.

What OpenTelemetry covers

Three signals:

  • Traces — distributed requests across service boundaries with spans, parent-child relationships, attributes
  • Metrics — numeric measurements over time (counters, gauges, histograms)
  • Logs — structured events with levels, attributes, optional trace correlation

For IoT, all three are useful but the constraints differ. Traces are most valuable for command-control flows; metrics for fleet-wide health; logs for incident response.

The instrumentation reality on MCUs

Full OpenTelemetry SDKs are not designed for microcontrollers. The reference implementations assume megabytes of RAM and unrestricted threads. On an ESP32 or Nordic nRF52, that’s a non-starter.

Two practical patterns:

Pattern A — Lightweight in-protocol propagation

The device itself doesn’t run a full OTel SDK. Instead, it propagates trace context via MQTT user properties (in MQTT 5) or message headers, and emits metrics/logs in OTLP-compatible format from the cloud-side after receiving messages.

What the device does:

  • Receive incoming traces in MQTT user properties (e.g., traceparent, tracestate) on commands
  • Generate child trace context for any work it performs
  • Embed trace context in outgoing telemetry messages
  • Emit metric and log fields in OTLP-compatible JSON or Protobuf

What the cloud does:

  • Receive the messages
  • Convert to full OTel traces, metrics, logs in a stream processor
  • Forward to the OTel collector and onwards to the chosen backend

This pattern is the right starting point for constrained MCUs. Most of the OpenTelemetry value comes from cloud-side correlation; the device just needs to participate in the trace.

For MQTT 5 user-property patterns see our MQTT 5 post.

Pattern B — Full OTel SDK on Linux gateways

For Linux-class edge devices (Raspberry Pi, BeagleBone, industrial gateways), the standard OpenTelemetry SDKs work. The C++ and Rust SDKs are mature; the Go SDK is excellent.

A typical gateway runs:

  • An OpenTelemetry Collector binary configured for batching and resource limits
  • Instrumented gateway services emitting OTLP traces, metrics, logs
  • The collector forwards to cloud over HTTPS or gRPC

This is the same pattern used for cloud workloads. For edge gateways, this is the default in 2026.

What to instrument

For an IoT product, the high-value spans are:

  • Command flow — phone-app sends command → cloud receives → broker forwards → device receives → device acts → device confirms. Each hop is a span; the whole chain is one trace.
  • OTA flow — release published → device receives notification → downloads → verifies → applies → confirms. Trace per device per OTA.
  • Provisioning / onboarding — device first contacts cloud → identity verification → registration → first telemetry. Full trace catches failure points.
  • Diagnostic actions — admin requests log dump from device → device responds. Trace ties admin action to device behaviour.

Continuous high-frequency telemetry (e.g., a sensor reading per second) is not worth tracing — too noisy, too expensive. Aggregate it as metrics instead.

Useful metrics for IoT fleets

Standard OpenTelemetry metrics for an IoT fleet:

  • device_connections_total — counter of total connections, with attributes for device class, region
  • device_messages_total — counter of messages by topic, version, type
  • device_battery_percentage — gauge per device (sampled, not per-message)
  • device_signal_strength — gauge of RSSI per device
  • device_uptime_seconds — counter of seconds since boot
  • ota_status — counter of OTA outcomes by status (downloading, applied, rolled-back, failed)

These are aggregatable across the fleet for dashboards while preserving per-device drill-down via attributes.

The collector — where IoT-specific transformation happens

The OpenTelemetry Collector is where you bridge IoT-native message formats into OTLP standard. Useful processors:

  • Attribute processor — extract device ID, firmware version, region from message metadata into trace/metric attributes
  • Filter processor — drop noisy spans (heartbeats, keep-alives) before they hit the backend
  • Batch processor — efficient batching to reduce backend ingestion cost
  • Resource detector — populate resource attributes from environment

A typical IoT collector deployment runs in the cloud, ingests from MQTT or Kafka, transforms, and forwards to the backend. For Linux gateways, a sidecar collector also runs.

Backends that work well for IoT

  • Grafana stack (Tempo for traces, Mimir for metrics, Loki for logs) — open-source, self-hostable, OpenTelemetry-native
  • Datadog — strong UI, good IoT-aware features, expensive at scale
  • Honeycomb — best-in-class for trace exploration, good fit for engineering teams
  • New Relic — broad enterprise features, full APM stack
  • OpenObserve — emerging open-source full-stack observability platform

For self-hosted deployments, the Grafana stack is the default. For cloud-aligned customers, Datadog or Honeycomb depending on team preference.

What kills OpenTelemetry-on-IoT projects

Three failure modes:

1. Over-instrumentation. Every function gets a span. The trace volume is unaffordable. Fix: instrument boundaries (network, IPC, storage, hardware), not internals.

2. No sampling strategy. Every command becomes a trace; the fleet generates millions per day. Fix: head-based sampling at the broker, or tail-based sampling at the collector, with explicit retention of error traces.

3. Mixing tracing and noisy telemetry. The device emits a sensor reading every second as a trace. Fix: sensor readings are metrics; commands and lifecycle events are traces.

What we typically build

For an IoT observability deployment with OpenTelemetry:

  • Device-side instrumentation for trace propagation in MQTT user properties (Pattern A) or full SDK on Linux gateways (Pattern B)
  • Stream processor or collector that translates IoT messages into OTLP
  • Sampling strategy with rules per trace type
  • Backend integration with the customer’s choice of backend
  • Dashboards for fleet health, OTA campaigns, command-flow latency
  • Operational runbook for using traces during incident response

If you are instrumenting an IoT fleet for observability, we have shipped OpenTelemetry-on-IoT across multiple deployments.

By Diglogic Engineering · May 9, 2026

Share

Ready to ship

Let's get started.

Tell us about the problem. We come back within one business day with a clear path, a timeline you can plan around, and a fixed-scope first milestone.