Business

SaaS

Technology

•

May 16, 2026

Adobe’s OpenTelemetry Pipeline Architecture at Enterprise Scale

Adobe’s central observability team runs OpenTelemetry across thousands of Collectors. Not as a pilot. As the production telemetry backbone for a globally distributed, polyglot engineering organisation shaped by decades of acquisitions.

That context matters. Their architectural decisions — three tiers, signal isolation, custom Collector binaries, voluntary adoption — map directly to the problems any platform team faces when standardising observability across many services and teams. The Enterprise OTel Migration Playbook uses Adobe’s architecture as a primary reference for compliance-heavy environments. Where Airbnb solved hyperscale throughput — 100 million metrics samples per second — Adobe solved a different problem: how do you get hundreds of service teams, each with different legacy stacks, to adopt a consistent telemetry platform without forcing anyone’s hand?

Why Did Adobe Need a Custom OpenTelemetry Architecture?

Adobe’s engineering estate is the product of acquisitions. Dozens of products. Multiple programming languages. Varied existing monitoring solutions. Compliance obligations that differ by product line.

Mandating a single proprietary APM agent across all those services creates vendor lock-in and compliance brittleness. OpenTelemetry, governed by the CNCF as a vendor-neutral open standard, gave Adobe a platform that could support multiple backends, work across languages, and stay independent of any single vendor’s release cycle. Bogdan Stancu, Senior Software Engineer at Adobe’s observability team, put it plainly: “It matched everything that we wanted.”

The core challenge wasn’t technical — it was organisational. How do you get hundreds of service teams, many with existing monitoring that’s already working, to instrument consistently? Adobe’s answer: don’t force them. The new OTel pipeline runs alongside existing monitoring for established services. Only new applications are built against it from the start. No re-instrumentation project. No team disruption.

What Is Adobe’s Three-Tier OpenTelemetry Collector Architecture?

Adobe organises its OpenTelemetry Collectors into three tiers. Each has a clearly defined role, ownership boundary, and degree of configurability.

Tier 1 lives in the service team’s Kubernetes namespace. Deploy the observability team’s Helm chart and you get two Collectors automatically. The first is a sidecar Collector — co-located inside the application pod, immutable configuration, collects all signals regardless of backend choice. The second is a deployment Collector that handles backend selection and routing, configurable through Helm values. When configuration changes, only the deployment Collector restarts. The application pod stays untouched.

That immutability is deliberate. If the sidecar’s config never changes, config updates never cause application restarts. Simple constraint, significant operational win.

Tier 2 is the centrally managed namespace, owned entirely by the observability team. All Tier 1 deployment Collectors forward telemetry here over OTLP. This is where Adobe’s key resilience decision lives: the managed namespace runs a separate Collector deployment per signal type — one for metrics, one for logs, one for traces.

Tier 3 is the observability backends. Teams select their preferred destination through the Helm chart’s values file. And despite aggregating traffic from thousands of upstream Collectors, the managed namespace deployments have generally run at default replica counts without significant auto-scaling.

Why Does Adobe Run Separate OpenTelemetry Collectors for Each Signal Type?

Signal-level isolation is Adobe’s answer to a specific failure scenario. If a metrics backend becomes rate-limited, the metrics Collector is the only thing affected. The log Collector keeps delivering audit trails. The trace Collector keeps delivering trace data.

Without this separation, a rate-limit event on the metrics backend backs up the entire export queue of a mixed-signal Collector — potentially blocking logs and traces going to backends that are performing normally. That’s a real problem.

For compliance-relevant workloads, this matters structurally. Audit log continuity cannot depend on the health of the metrics backend. Signal-level isolation makes that guarantee architectural rather than operational. You’re relying on topology, not discipline.

The routing mechanism is worth understanding. Service teams configure their backend preference through Helm values. The deployment Collector sets that preference as an HTTP header on its OTLP exports to Tier 2. The managed namespace Collectors then use the routing connector to read that header and direct each signal type to its designated exporter. Teams configure a preference — they don’t manage routing logic.

Why Would You Build a Custom OpenTelemetry Collector Distribution?

The OpenTelemetry project publishes two official Collector distributions: Core (minimal) and Contrib (everything — hundreds of receivers, processors, exporters, and extensions). Most organisations run Contrib by default. Adobe runs neither.

Adobe builds its own minimal Collector binary using the OpenTelemetry Collector Builder (OCB). You give OCB a YAML manifest listing exactly which components you want, at which versions. It downloads those components and compiles a single binary. The manifest is version-controlled — the auditable record of what’s in your Collector at any point.

The rationale is dependency surface and attack surface together. Every component in a Contrib binary is a potential vulnerability. A custom distribution that excludes unused components also excludes their CVE exposure. CVE-2026-40182, an OpenTelemetry .NET vulnerability disclosed in April 2026, affected versions 1.13.1 through 1.15.1 — if that component isn’t in your custom distribution, the CVE doesn’t apply to your deployment.

Vendor distributions sit in the middle. NRDOT from New Relic and DDOT from Datadog are pre-built distributions curated for their respective backends. They reduce build-and-maintain overhead but introduce a dependency on the vendor’s release cadence. If security and compliance control matters to you, building with OCB gives you full control over component selection, upgrade timing, and security posture.

Adobe’s custom distribution is the default in the Helm chart provided to service teams. Teams can opt into Contrib if they need a component outside the custom build, but the default is the smaller, tighter binary. Stancu’s advice: “Treat OpenTelemetry as a platform to build on. Don’t expect it to solve all your problems out of the box.”

How Does Adobe Get Service Teams to Adopt OpenTelemetry Without Mandating It?

The adoption model inverts the typical enterprise OTel migration. Rather than mandating re-instrumentation, Adobe built a self-service platform with two entry points: a Helm chart and two Kubernetes annotations.

To enable OTel telemetry for a service, a team adds two annotations to their Kubernetes deployment manifest — specifying the language runtime and requesting sidecar injection. The OTel Kubernetes Operator reads these annotations and injects the sidecar Collector and the appropriate language SDK agent at pod start. No source code changes. No OTel expertise required. As Stancu put it: “People add two lines in their deployment. And it just works.”

Auto-instrumentation at the Operator level is how you solve the scale problem without creating a re-instrumentation project. Teams configure their backend preference through Helm values and get on with their work.

Voluntary adoption means existing services with working monitoring continue as-is. Coverage grows organically. No migration tax. The trade-off is slower coverage growth — if compliance obligations require OTel telemetry for specific service categories by a particular date, you’ll need a more targeted push for those services.

How Does Adobe Manage Its Custom OpenTelemetry Distribution at Scale?

Every quarter, the central observability team publishes a new version of the custom Collector distribution and updates the OTel Kubernetes Operator. Service teams pick up the new Collector version on their next deployment — no additional action required.

The quarterly cadence came from experience. Monthly upgrades create testing overhead that consumes too much team capacity. Annual upgrades let security exposure accumulate. Quarterly is the balance point.

What makes this tractable is that the build process is automated and declarative. The OCB manifest is version-controlled — upgrading the distribution is a manifest update followed by a build pipeline run. The manifest pins component versions, so the upgrade record is auditable. This is Adobe’s quarterly upgrade cadence as GitOps discipline in practice — version-controlled configuration as the operational record of what’s running.

One compatibility tension worth flagging: when the Operator is upgraded, it can modify the OpenTelemetryCollector custom resource to align with new configuration expectations. If a service team is running a significantly older Collector version, Collectors can fail to start without any changes on their end. Operator and Collector versions need to move together.

What Does Adobe’s Architecture Mean for Compliance-Heavy Organisations?

Adobe’s design choices — signal-level isolation, custom Collector distribution, voluntary adoption, quarterly upgrade cadence — were made for operational resilience and organisational scalability. The compliance implications are real and worth naming.

Signal-level isolation means audit logs travel through a dedicated log Collector unaffected by metrics or trace backend failures. For HealthTech teams navigating HIPAA audit log requirements, FinTech teams dealing with PCI DSS log integrity, or EdTech teams tracking FERPA data access — the guarantee that your log pipeline runs independently of your metrics backend is structurally enforced. You’re not relying on operational discipline.

A custom Collector distribution makes security audits more tractable. Unused components that carry CVEs simply aren’t present in the binary, because you never included them. The quarterly upgrade cadence creates a documented patch cycle; the OCB manifest provides the auditable record of exactly which components are in production.

These patterns work at smaller scale too. A 100-person FinTech doesn’t need thousands of Collectors to benefit from signal-level isolation — two or three Tier 2 Collector deployments cover the pattern. A custom distribution built with OCB is a one-time investment that pays dividends each upgrade cycle. OpenTelemetry is in production at Adobe, Airbnb, Shopify, and many other organisations as of 2026 — if you’re in a regulated industry evaluating production readiness, that’s a meaningful signal.

For the full set of migration patterns — including compliance-heavy approaches — see the compliance-heavy migration patterns in the playbook.

Frequently Asked Questions

What is signal-level isolation in OpenTelemetry?

Running a separate Collector deployment for each telemetry signal type — one for metrics, one for logs, one for traces — rather than routing all signals through a single Collector. A backend failure for one signal type is contained to that signal’s Collector and cannot propagate to the others. Adobe implements this at Tier 2 of its three-tier architecture, using the routing connector to direct each signal type to its designated exporter.

Why would you build a custom OpenTelemetry Collector distribution instead of using the default?

The default OTel Contrib distribution includes hundreds of components — most of which you’ll never use. Each one is a potential vulnerability surface. A custom distribution built with the OCB tool includes only what you actually use: smaller binary, smaller dependency surface, smaller attack surface. Adobe builds its own for exactly this reason.

How does the OpenTelemetry Kubernetes Operator work?

The OTel Kubernetes Operator manages Collector lifecycle and auto-instrumentation injection in Kubernetes clusters. Teams add two annotations to their deployment manifest — language runtime and sidecar injection. The Operator reads those annotations and injects the OTel SDK agent and sidecar Collector at pod start. No source code changes required.

What is the chain Collector problem in OpenTelemetry?

When Collectors are chained — as in Adobe’s Tier 1 to Tier 2 architecture — an OTLP transaction completes with HTTP 200 before the downstream Collector attempts to export to the actual backend. Backend failures are invisible to the upstream Collector. Adobe solved this with a custom circuit breaker extension that proactively validates backend authentication and returns HTTP 401 upstream if it fails — propagating the error before the transaction completes.

How does Adobe’s approach differ from Airbnb’s OpenTelemetry migration?

They solved different problems. Airbnb’s challenge was hyperscale throughput — 100 million samples per second, 10x CPU reduction target. Its architecture centres on VictoriaMetrics vmagent for streaming aggregation. Read more: Airbnb’s OTel Migration. Adobe’s challenge was organisational adoption — hundreds of service teams, acquisition complexity, compliance requirements. Its architecture centres on a self-service Helm chart, the OTel Operator, and a three-tier topology built for zero-effort onboarding.

Is OpenTelemetry ready for production at a mid-size SaaS company?

Yes. It’s in production at Airbnb, Adobe, Shopify, and many others as of 2026. The challenge isn’t OTel’s readiness — it’s migration complexity. Starting from a proprietary APM stack requires a phased approach: instrumentation audit, dual-write transition, Collector topology design, backend selection, security hardening. Adobe’s three-tier pattern scales down cleanly for a 50-500 person team.

Adobe’s OpenTelemetry Pipeline Architecture at Enterprise Scale

Why Did Adobe Need a Custom OpenTelemetry Architecture?

What Is Adobe’s Three-Tier OpenTelemetry Collector Architecture?

Why Does Adobe Run Separate OpenTelemetry Collectors for Each Signal Type?

Why Would You Build a Custom OpenTelemetry Collector Distribution?

How Does Adobe Get Service Teams to Adopt OpenTelemetry Without Mandating It?

How Does Adobe Manage Its Custom OpenTelemetry Distribution at Scale?

What Does Adobe’s Architecture Mean for Compliance-Heavy Organisations?

Frequently Asked Questions

What is signal-level isolation in OpenTelemetry?

Why would you build a custom OpenTelemetry Collector distribution instead of using the default?

How does the OpenTelemetry Kubernetes Operator work?

What is the chain Collector problem in OpenTelemetry?

How does Adobe’s approach differ from Airbnb’s OpenTelemetry migration?

Is OpenTelemetry ready for production at a mid-size SaaS company?

Related Articles

Making Agile Work Outside Your Dev Department

After the wireframes – the rules at the heart of your app

There’s a better way to spend your hiring budget

Need a reliable team to help achieve your software goals?

BUSINESS HOURS

SYDNEY

YOGYAKARTA

BANDUNG