Business

SaaS

Technology

•

May 7, 2026

OpenTelemetry for LLM Applications and Why It Prevents Observability Vendor Lock-In

Q: Can I use OpenTelemetry to monitor my ChatGPT or Claude API calls?

Yes. opentelemetry-instrumentation-openai intercepts OpenAI API calls automatically, and community packages exist for Anthropic. Both emit gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens, and gen_ai.usage.output_tokens without any changes to existing code. Install the package, call .instrument() at startup. That is all there is to it.

Q: Where do I find the OpenTelemetry GenAI semantic conventions documentation?

The canonical specification is at opentelemetry.io/docs/specs/semconv/gen-ai/. Auto-instrumentation packages are on PyPI under the opentelemetry-instrumentation-* namespace.

Eighty-five percent of organisations plan to enable LLM observability in 2026, according to Elastic‘s observability survey. The debate about whether to instrument is over. The debate that matters now is which instrumentation approach keeps your options open once the bills start stacking up.

And they will stack up. LLM workloads are expensive to observe — spans are large, token volumes grow with usage, and proprietary platforms charge per gigabyte indexed, per host, and per seat. Lock yourself into a vendor’s proprietary SDK today and every future backend switch means rewriting instrumentation calls across every service you run. That cost feels theoretical when you have three services. In 18 months, when you have thirty, it becomes an engineering quarter.

OpenTelemetry (OTel) is a vendor-neutral, CNCF-governed instrumentation standard that has been extended for LLM-specific signals via GenAI Semantic Conventions. It will not wow anyone at a sales call. What it does is make the re-instrumentation problem not exist. This article covers the why (vendor lock-in economics), the what (the gen_ai.* schema), and the where-to-start (auto-instrumentation in under a day).

This is one layer in the broader LLM observability strategy this instrumentation enables.

Why does choosing the wrong instrumentation standard today cost you money in two years?

Vendor lock-in in observability is not abstract. Your application code calls a proprietary SDK, and switching backends means rewriting every instrumentation call across every service. Three services? Manageable. Thirty? That is a quarter of engineering work, minimum.

LLM workloads make this worse. Token volumes grow with usage, and proprietary platforms charge for indexed log volume by the gigabyte. Gartner found that up to 84% of observability users struggle with costs, and surprise bills of $130,000 per month or more are documented in the market. It is not a hypothetical.

Proprietary agents also embed vendor-specific data formats — once your traces are in a vendor’s schema, they are not portable. The fix is to standardise on a vendor-neutral layer while your estate is still small. Re-instrumentation cost is near zero today but grows linearly with every service you add.

Tom Wilkie, CTO of Grafana Labs, put it plainly: “The cost of re-instrumentation is prohibitive. OTel is making greenfield work easier, but customers still need to weave together new and old systems — and need observability systems that don’t lock customers into one way of doing things.”

The failure modes that OTel instrumentation is designed to detect — silent cost escalation, degraded outputs, context overflow — are invisible without the right telemetry schema in place.

What are GenAI Semantic Conventions and what do they actually standardise?

GenAI Semantic Conventions are the LLM-specific schema extension within OpenTelemetry — a standardised set of attribute names defined by the OTel GenAI Special Interest Group (SIG).

Because every LLM provider’s telemetry uses the same attribute names, your dashboards and alerts work across providers and backends without modification. Build a cost dashboard against gen_ai.usage.input_tokens in Grafana today, and that same query works in Datadog, Honeycomb, or any OTLP-compatible backend tomorrow. No rebuilding. No rewriting.

The key attributes are:

gen_ai.system: LLM provider (openai, anthropic) — per-provider cost breakdown
gen_ai.request.model: Model requested — per-model cost attribution
gen_ai.usage.input_tokens + gen_ai.usage.output_tokens: The cost counters
gen_ai.response.finish_reasons: Why generation stopped — length signals context overflow
gen_ai.operation.name: Operation type (chat, embeddings) — separates inference cost from retrieval cost

Those five attributes answer the questions that matter most in production. They are production-ready now.

With proprietary schemas, every dashboard and alert is non-portable. Switching vendors means rebuilding every query and cost report on top of rewriting the instrumentation itself. The GenAI SIG operates under CNCF governance, which means the standard evolves through open contribution — not vendor product roadmaps.

The gen_ai.* attributes that power token-level cost attribution go deeper into this schema’s practical use for multi-tenant cost governance.

How do you get LLM traces flowing in under a day with no code changes?

OTel auto-instrumentation is the Monday-morning entry point. Install a package, add three lines of startup code, and every API call to OpenAI, Anthropic, LangChain, or LlamaIndex is automatically traced with gen_ai.* attributes. No changes to your business logic.

The packages — opentelemetry-instrumentation-openai, opentelemetry-instrumentation-anthropic, opentelemetry-instrumentation-langchain, opentelemetry-instrumentation-llamaindex — are installed via pip. Call the instrumentor once at startup and all subsequent API calls are traced automatically.

What you get immediately: per-request token counts, latency, model used, and finish reason. Enough to answer “how much did that user request cost?” on day one.

A couple of things worth noting. Prompt and completion content is not captured by default — intentionally. Content capture requires explicitly enabling it in environments where data residency is controlled. And custom retrieval pipelines or multi-model orchestration will need manual spans eventually. But that is a day-two problem, not a blocker.

What does the OTel Collector actually do and why does it matter for your backend choice?

The OTel Collector sits between your instrumented application and your storage backend. It handles routing, processing, sampling, and exporting to whatever backends you configure. Switching backends becomes a Collector configuration change, not a re-instrumentation project. Your application code never changes.

Deductive AI demonstrates exactly how this plays out. When Datadog cut off their observability access overnight in December 2025, Deductive AI migrated to an open Grafana stack in 48 hours. Their conclusion: “Our forced migration demonstrated that what once looked like catastrophic vendor lock-in is now closer to a recoverable configuration change.” Their instrumentation was already OTel-based — the Collector’s exporter config was updated and the application code was not touched.

STCLab‘s Grace Park described the same outcome at scale: “Complete backend decoupling. Migrating from Tempo to Jaeger requires only one config line change and zero application changes.”

For a lean team, a single Collector container alongside your application covers everything needed on day one. It can also fan out to multiple backends simultaneously if you want to run a dual-backend migration validation before fully committing.

What is tail-based sampling and why does it prevent your telemetry bill from growing with your LLM traffic?

Head-based sampling at 10% cuts storage costs but makes 90% of errors invisible. That is the opposite of what you need for production reliability.

Tail-based sampling solves this. The Collector buffers a window of traces and makes the sampling decision after seeing the full outcome. The practical pattern looks like this:

Traces with error status: 100% — never miss a failure
Agent runs: 100% — high-value, relatively rare
High-token requests (> 2,000 tokens): 100% — runaway cost signals
Normal successful calls: 5–10% — representative for latency and cost trends

You get complete error coverage at a fraction of full-volume storage cost. The telemetry bill scales with meaningful events, not raw traffic.

What did the STCLab migration to the LGTM Stack actually cost, and what did it save?

STCLab is a South Korean technology company whose platforms support up to 3.5 million simultaneous users across 200 countries. In 2023 they migrated from a proprietary observability vendor to OTel plus the LGTM Stack. The headline result: 72% cost reduction.

What forced the migration matters as much as the number. The previous vendor cost so much that STCLab had disabled APM entirely in dev and staging and sampled just 5% of production traffic. They were not optimising costs — they were flying blind because full observability was economically out of reach.

Here is the core economic difference. Proprietary vendors charge for indexed log volume, per-host licensing, and per-seat access — costs that grow with data volume. The LGTM Stack charges only for compute and object storage, which grow with retention time, not ingestion rate. Post-migration, STCLab achieved 100% APM coverage in all environments at 28% of the previous vendor cost.

The LGTM Stack is four open-source, CNCF-graduated components: Loki (log aggregation), Grafana (dashboards), Tempo (distributed traces), and Mimir (long-term metrics storage).

A team of five may prefer Grafana Cloud — the managed LGTM offering — over self-hosting. It accepts OTLP natively, uses open query languages, and preserves backend portability without the infrastructure overhead. In the four-stage maturity roadmap: self-hosted LGTM is a Stage 3 or 4 destination. Grafana Cloud is the Stage 2 entry point.

Which observability backends support OTel GenAI SemConv natively, and which lock you in?

Native OTel GenAI SemConv support means the backend accepts OTLP, understands gen_ai.* attribute names without custom parsing, and surfaces token-level data in built-in dashboards. Not every backend that accepts OTLP meets this bar.

Backends with native support:

Grafana / LGTM Stack: Full OTLP ingestion; gen_ai.* attributes queryable without custom rules; PromQL and TraceQL keep the query layer open
Datadog: Accepts OTLP and announced OTel GenAI SemConv support — though Datadog also offers proprietary SDKs that create lock-in if you use those instead of OTel
Honeycomb: OTel-first design; gen_ai.* attributes native
Langfuse: Open-source LLM observability platform built around OTel-compatible tracing
Uptrace: Open-source APM on ClickHouse; automatically groups spans by gen_ai.request.model and gen_ai.system

One thing to watch for: accepting OTLP at the ingestion layer is not the same as being lock-in-free at the query layer. Vendor-specific dashboard formats and proprietary alert DSLs recreate lock-in even when the telemetry format is open. The portability test is simple — can your dashboards be reproduced in another backend using PromQL or TraceQL? If not, you have query-layer lock-in regardless of what you chose for instrumentation.

Use OTel for instrumentation universally, and choose a backend with open query languages. A full comparison of which backends support OTel GenAI SemConv natively is covered separately.

Instrumentation is the foundation. Where it fits in the four-stage maturity roadmap — from zero observability to autonomous incident response — is the broader context worth reading once you have telemetry flowing.

Frequently Asked Questions

What is OpenTelemetry and why does it matter specifically for LLM applications?

OpenTelemetry is a vendor-neutral, CNCF-governed observability framework for collecting metrics, logs, and traces. LLM applications need specialised extensions because traditional APM assumes deterministic code paths. LLM systems break that assumption — outputs are non-deterministic, cost correlates with token counts, and agent chains have multiple independent failure points. The gen_ai.* schema adds those signals in portable form.

Is OpenTelemetry good enough for monitoring AI agents?

Yes. Each step in an agent’s reasoning chain becomes a child span, and the trace waterfall shows where time and tokens are spent across tool calls, LLM calls, and vector database lookups. Capture agent runs at 100% via tail-based sampling — they are high-value and relatively rare.

Can I use OpenTelemetry to monitor my ChatGPT or Claude API calls?

Yes. opentelemetry-instrumentation-openai intercepts OpenAI API calls automatically, and community packages exist for Anthropic. Both emit gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens, and gen_ai.usage.output_tokens without any changes to existing code. Install the package, call .instrument() at startup. That is all there is to it.

What is the OpenTelemetry GenAI Special Interest Group (SIG)?

The OTel GenAI SIG defines and maintains the GenAI Semantic Conventions within CNCF governance. The standard evolves through open contribution, not vendor product roadmaps — which is why the stable gen_ai.* attributes are safe to build infrastructure bets on today.

Where do I find the OpenTelemetry GenAI semantic conventions documentation?

The canonical specification is at opentelemetry.io/docs/specs/semconv/gen-ai/. Auto-instrumentation packages are on PyPI under the opentelemetry-instrumentation-* namespace.

What are open-source LLM observability tools that support OpenTelemetry?

Three categories: self-hosted full stacks (LGTM Stack, Uptrace, Langfuse); managed with open ingestion (Grafana Cloud); OTel-compatible backends (Honeycomb, Arize Phoenix, Langwatch). All accept gen_ai.* attributes without custom parsing.

OpenTelemetry vs. proprietary SDKs — what do I actually give up by going open source?

You give up managed pipeline scaling, AI-assisted dashboards, and vendor support SLAs. What you gain is no per-host or per-GB pricing, backend portability, and community-maintained packages for every major LLM provider. Grafana Cloud removes the self-hosting burden while preserving portability.

How do I get from zero observability to production-ready LLM monitoring without a dedicated platform team?

Day 1: install auto-instrumentation, point the OTLP exporter at Grafana Cloud — no infrastructure to operate. Week 1: add a local OTel Collector for PII redaction and sampling. Month 1: configure tail-based sampling. Quarter 1: evaluate whether self-hosted LGTM is justified. Each step is achievable by a lean team, and none of them require undoing the previous one.

What does “observability vendor lock-in” mean in practice for LLM applications?

Your instrumentation calls a vendor’s proprietary SDK, and switching backends requires rewriting every trace and metric call across every service. For LLM applications, rapid data volume growth on per-GB pricing compounds this into surprise bills. OTel eliminates the instrumentation layer of lock-in. A backend with open query languages eliminates the dashboard and alert layer too.

OpenTelemetry for LLM Applications and Why It Prevents Observability Vendor Lock-In

Why does choosing the wrong instrumentation standard today cost you money in two years?

What are GenAI Semantic Conventions and what do they actually standardise?

How do you get LLM traces flowing in under a day with no code changes?

What does the OTel Collector actually do and why does it matter for your backend choice?

What is tail-based sampling and why does it prevent your telemetry bill from growing with your LLM traffic?

What did the STCLab migration to the LGTM Stack actually cost, and what did it save?

Which observability backends support OTel GenAI SemConv natively, and which lock you in?

Frequently Asked Questions

What is OpenTelemetry and why does it matter specifically for LLM applications?

Is OpenTelemetry good enough for monitoring AI agents?

Can I use OpenTelemetry to monitor my ChatGPT or Claude API calls?

What is the OpenTelemetry GenAI Special Interest Group (SIG)?

Where do I find the OpenTelemetry GenAI semantic conventions documentation?

What are open-source LLM observability tools that support OpenTelemetry?

OpenTelemetry vs. proprietary SDKs — what do I actually give up by going open source?

How do I get from zero observability to production-ready LLM monitoring without a dedicated platform team?

What does “observability vendor lock-in” mean in practice for LLM applications?

Related Articles

Which of the top 5 AI coding assistants is right for you?

Making Agile Work Outside Your Dev Department

How to build the dev team you need with the budget you have

Need a reliable team to help achieve your software goals?

BUSINESS HOURS

SYDNEY

YOGYAKARTA

BANDUNG