Insights Business| SaaS| Technology How to Design AI Product Pricing That Survives Variable Inference Costs
Business
|
SaaS
|
Technology
Mar 18, 2026

How to Design AI Product Pricing That Survives Variable Inference Costs

AUTHOR

James A. Wondrasek James A. Wondrasek
Graphic representation of AI Inference Cost Crisis and pricing strategies

Most AI products are priced like SaaS products. But they do not have SaaS cost structures.

SaaS pricing was built for a world where serving one more customer costs you almost nothing. AI does not work that way. Every query, every output, every agent task comes with a real inference bill — GPU compute, API calls, model licensing — and that bill scales directly with usage. The gross margin gap is structural: AI companies are averaging 50–60% gross margins versus the 80–90% that SaaS operators treat as table stakes.

This is not something you can engineer your way out of. The AI inference cost crisis facing AI-native companies is a pricing model problem, not an efficiency problem.

ICONIQ Capital‘s 2026 State of AI report found that 37% of AI companies are actively planning to change their pricing model in the next 12 months. That is what is happening now.

This article gives you a practical framework for choosing between the three primary AI pricing archetypes — consumption, workflow, and outcome-based — with a worked modelling approach and the Intercom Fin case study as a real-world benchmark.

Why does AI have lower gross margins than SaaS — and what does that mean for pricing?

AI’s 50–60% gross margins versus 80–90% for SaaS is not a startup-phase anomaly. It is structural. ICONIQ’s 2026 data shows AI gross margins at 52% — up from 41% in 2024 — but still nowhere near SaaS territory. Model inference alone averages 23% of total AI product costs at scaling-stage companies.

Bessemer Venture Partners put it plainly: “Companies see 50–60% gross margins vs. 80–90% for SaaS.” Unlike SaaS, where additional customers approach zero marginal cost, every AI inference has a real COGS. As ML lead Jacob Jackson put it: “When you receive $10 from the customer, you can’t just spend 10 cents on AWS. GPUs are expensive.”

Ben Murray at TheSaaSCFO ran the numbers: to reach equivalent EBITDA to a SaaS business, an AI company needs approximately 6x the revenue. A $50,000 SaaS product needs an AI equivalent at roughly $250,000–$300,000 per year to deliver comparable unit economics. Not because the AI delivers six times more value — because its cost structure is fundamentally different. That 5–6x ARPA requirement is not a negotiating position. It is arithmetic.

There is one more COGS item that often gets missed: Forward-Deployed Engineers. ICONIQ’s data shows 32% of AI companies now deploy FDEs to support enterprise customers. If your pricing does not account for that effort, you are building margin compression into every enterprise deal from day one.

For more on why AI gross margins are lower than SaaS, see the foundational article in this cluster.

What are the three AI pricing archetypes — and how do they each handle inference cost variability?

BVP’s AI Pricing and Monetisation Playbook identifies three pricing archetypes — and each one is a different answer to the same question: who bears the cost variability risk? BVP frames the trade-off: “As you move from consumption to workflow to outcome-based pricing, you’re accepting more cost risk in exchange for tighter alignment with customer value.”

Consumption-based pricing (per token / per API call) passes cost variability entirely to the customer. It works well for technical buyers — developers, platform engineers, API integrators. GitHub Copilot and the OpenAI API are the obvious examples.

The problem is non-technical buyers. Metronome found customers avoided using AI features even when free credits were included — they feared unpredictable bills. Leena AI experienced this directly: after charging on consumption, “customers became wary of using the product — the pricing model was counterproductive.”

Workflow-based pricing (per completed task) decouples price from token count. The customer pays per discrete, bounded task — booking a meeting, analysing a document, generating a demand letter. EvenUp captures better margins charging per completed legal demand letter rather than by inference volume.

The catch: cost variability risk shifts to you. One analysis might cost $0.05 in inference. A complex multi-source brief might cost $0.45. If you priced the task at $0.50, your gross margin swings between 90% and 10% depending on what lands in the queue.

Outcome-based pricing (per successful result) is where the industry is heading. The customer pays only when a defined, measurable outcome is achieved — a ticket resolved, a claim processed. ICONIQ’s data: outcome-based pricing jumped from 2% to 18% adoption in six months. Forty-three per cent of enterprise buyers now consider it a significant purchase factor.

The prerequisite is measurement infrastructure. You cannot bill on resolutions if you cannot detect when one has occurred — and that is what trips up most teams attempting the transition.

How does Intercom Fin’s $0.99 per resolution pricing work — and what does it mean for your margins?

Intercom Fin is the most cited proof that outcome-based pricing works at scale: $0.99 per resolved support ticket, 1 million customer issues per week, $100M+ ARR.

Why $0.99? Value-based logic. A human-handled support ticket costs $8–15 or more in most contact centres. At $0.99, Fin is priced at roughly 10% of the cost of the outcome it replaces.

The $0.99 is the variable component of a hybrid model. Customers also pay a base Intercom platform fee. The $0.99 activates on top, for autonomous resolutions only. Add the $1M performance guarantee for customers who do not hit expected resolution rates, and the full structure is: base platform fee + $0.99 per autonomous resolution + performance guarantee.

GTMnow’s interview with Intercom’s president put it well: “Guarantees change buyer psychology more than pricing ever could. The $0.99 price gets attention, but it’s the $1M performance guarantee that builds trust.” That guarantee is a conversion mechanism, not just a risk instrument.

The lesson here is straightforward. The question that produced the $0.99 is the same question you need to answer: what is one resolved outcome worth to my customer, and what is my inference cost per attempt? If the first is materially larger than the second, outcome-based pricing is viable. BVP’s guidance: the platform fee should cover at least 2x your delivery costs before variable pricing activates.

How do you model your inference cost exposure before committing to a pricing model?

Before you pick an archetype, run a cost exposure calculation. BVP’s rule: “If the math doesn’t work at 10 customers, it won’t at 1,000.”

Under consumption-based pricing, cost exposure is essentially zero — inference spikes pass to the customer. Your risk is adoption suppression, not margin compression.

Under workflow-based pricing, exposure comes from task complexity variance. Average inference cost $0.10, task price $0.50: 80% gross margin. Complex task at $0.45 inference, same price: 10% gross margin. What is the realistic complexity range for your tasks, and does your pricing survive the high end?

Under outcome-based pricing, you incur inference costs on all attempts — including failed ones:

Effective cost per charged outcome = cost per attempt ÷ resolution rate

Inference cost $0.20, resolution rate 70%: effective cost per charged outcome is $0.286. At 50% resolution it is $0.40. The failed attempts generate costs with no revenue to offset. Model this accurately before you set the price.

BVP’s hybrid formula handles this cleanly: platform fee at 2x minimum delivery costs, plus included outcome credits, plus variable overage. Their example: $12,000 annual platform fee; 100 included ticket resolutions; additional resolutions at $5,000 per 100. Fixed costs covered before the variable tier activates.

Use your first 50–100 production customers to build cost baselines before committing at scale.

How do you choose which AI pricing model fits your product — consumption, workflow, or outcome?

BVP identifies three selection criteria: value attribution (how clearly can the AI’s contribution be measured?), execution autonomy (does the AI act independently or assist a human?), and workload predictability (how variable is inference cost per unit?).

Choose consumption-based if your buyer is technical and can model their own usage; your product is an API, SDK, or developer tool; you are in early discovery without outcome measurement capability.

Choose workflow-based if your AI completes discrete, bounded tasks with relatively stable complexity; your buyer is non-technical and needs predictable pricing; task complexity variation stays manageable.

Choose outcome-based if the outcome is clearly measurable and attributable to the AI; customers value it highly relative to your inference cost; you have production data — not PoC estimates — to set the price accurately.

Choose hybrid if you are uncertain about outcome rate or cost variability; you need buyer predictability and upside capture simultaneously.

Two examples to make this concrete.

A 100-person FinTech with a document summarisation feature for loan officers is most likely a workflow-based or hybrid candidate. The task is bounded, the customer base is cost-sensitive, and outcome definition is complicated in a compliance context. Workflow-based with a hybrid floor is the right starting point.

A 400-person HealthTech with AI-native workflow automation — appointment booking, claim processing — is an outcome-based or hybrid candidate. Workflows are measurable, value per outcome is high, production data is the prerequisite.

There is also some urgency here. BVP calls out soft ROI products as the category most at risk: “Much of the ‘sexy’ AI products today live in soft ROI territory… As many enter renewal cycles for the first time in 2026, pricing will need to reflect actual value, not merely potential or promise.” If your AI feature was deployed in 2024–2025 under flat-rate SaaS framing and is approaching renewal, run the archetype selection exercise now. Do not wait.

When should you change your AI pricing model — and how do you do it without losing customers?

Four signals say it is time: consistent margin compression per customer; high churn at renewal because customers cannot justify value; rapid usage growth the current model cannot capture economically; new outcome-measurement capability that makes outcome-based pricing viable for the first time.

Three prerequisites:

  1. Real outcome rate data from production — not PoC estimates. OpenView research found 78% of companies successfully using outcome pricing had products on market for 5+ years.
  2. Infrastructure to measure and attribute outcomes — tracking, attribution logic, automated billing triggers.
  3. Communication framing the change as value alignment — “we’re moving to pay-for-performance” is different from “we’re changing our pricing.” Grandfather existing enterprise customers for 6–12 months.

Credits can bridge the transition. Metronome calls them “transitional scaffolding” — useful while you establish the real value metric, not a permanent structure. Avoid announcing the change without production data, migrating enterprise customers mid-contract, or adopting outcome-based pricing before attribution infrastructure is in place.

Once you commit to a new model, real-time cost governance to validate your pricing model becomes an ongoing discipline. The assumptions behind your pricing design — cost per outcome, resolution rate, margin at scale — need continuous validation in production.

FAQ

Is consumption-based pricing always a bad choice for AI products?

No. When your buyer is technical, your product is developer-facing, and inference costs are predictable per unit, it is the right choice. The adoption suppression problem Metronome identified is specific to enterprise non-technical contexts.

How do I migrate from consumption-based to outcome-based pricing without losing customers?

Three steps: collect real outcome rate data from production before announcing anything; build or buy attribution and billing infrastructure first; communicate the change as “we’re moving to pay-for-performance.” Grandfather existing enterprise customers for 6–12 months. Use credits as transitional scaffolding but position them as temporary.

What does the ARPA 5–6x SaaS requirement mean in practice for a $50,000 per year deal?

A $50,000 per year SaaS contract needs an AI equivalent at approximately $250,000–$300,000 to deliver comparable EBITDA. The gross margin is structurally lower — 50–60% versus 80–90% — and the AI company needs approximately 6x the revenue to cover the COGS gap.

How much should I budget for AI inference at a 200-person SaaS company?

Use ICONIQ’s benchmark: inference averages 23% of total AI product costs at scaling-stage companies. Under outcome-based pricing, include inference on failed outcomes — if your resolution rate is 70%, 30% of your inference spend generates no revenue. Never use PoC cost estimates for production budgeting.

What does outcome-based pricing actually require the vendor to build?

Flexprice‘s five-component minimum: clear contractual outcome definition; data tracking and attribution mapping AI actions to results; aligned internal team structure; risk/reward framework covering failed attempts; automated outcome-linked billing. Skip attribution and you will face billing disputes. When multiple factors influence results simultaneously, you need control groups or baseline comparisons.

How does hybrid pricing protect gross margins in practice?

BVP’s formula — platform fee at 2x minimum delivery costs, plus included outcome credits, plus variable overage — ensures fixed costs are covered before variable pricing activates. The floor removes exposure to near-zero revenue in low-usage months. The credits give enterprise buyers the predictability they need for procurement.

What is the difference between outcome-based and output-based pricing?

Flexprice states the terms are interchangeable. Where a distinction is made: output-based means delivering a specific artefact (a draft, a report); outcome-based means a measurable result (a ticket resolved, a claim processed). For most pricing decisions, treat them as equivalent.

How do I decide whether my AI product is ready for outcome-based pricing?

Four checks: Can you define a successful outcome in a contract? Do you have real production data to set the price accurately? Can you build or buy attribution and billing infrastructure? Is your inference cost per attempt low enough that your price per successful outcome delivers acceptable gross margin? If any are “no,” start with consumption-based or hybrid and migrate once the prerequisites are met.

What is the 2026 renewal cliff and why does it affect AI pricing strategy?

AI pilot contracts signed in 2024–2025 — often under SaaS-style seat pricing — are now approaching their first annual renewal. ICONIQ data shows AI products providing “soft ROI” are at high churn risk because customers cannot quantify the value received. If your product was deployed under a model that does not capture measurable outcomes, the renewal conversation relies on the customer’s tolerance for unquantified value — which tends toward zero under budget pressure.

How does Intercom price Fin AI — is $0.99 per resolution the full story?

No. The $0.99 per resolved ticket is the variable component of a hybrid model. Customers also pay a base platform fee, and Fin activates on top for autonomous resolutions only. Intercom backs it with a $1M performance guarantee. The complete model: base platform fee + $0.99 per autonomous resolution + performance guarantee. The $0.99 gets attention. The guarantee builds trust. The platform fee protects margin.

Pricing strategy is one piece of a larger picture. For a complete resource on understanding AI inference economics — from the financial reality of lower gross margins through infrastructure decisions and governance — the full guide to AI inference economics covers the end-to-end challenge facing AI-native companies.

AUTHOR

James A. Wondrasek James A. Wondrasek

SHARE ARTICLE

Share
Copy Link

Related Articles

Need a reliable team to help achieve your software goals?

Drop us a line! We'd love to discuss your project.

Offices Dots
Offices

BUSINESS HOURS

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Sydney

SYDNEY

55 Pyrmont Bridge Road
Pyrmont, NSW, 2009
Australia

55 Pyrmont Bridge Road, Pyrmont, NSW, 2009, Australia

+61 2-8123-0997

Yogyakarta

YOGYAKARTA

Unit A & B
Jl. Prof. Herman Yohanes No.1125, Terban, Gondokusuman, Yogyakarta,
Daerah Istimewa Yogyakarta 55223
Indonesia

Unit A & B Jl. Prof. Herman Yohanes No.1125, Yogyakarta, Daerah Istimewa Yogyakarta 55223, Indonesia

+62 274-4539660
Bandung

BANDUNG

JL. Banda No. 30
Bandung 40115
Indonesia

JL. Banda No. 30, Bandung 40115, Indonesia

+62 858-6514-9577

Subscribe to our newsletter