Insights Business| SaaS| Technology Why AI Gross Margins Are So Much Lower Than SaaS and What That Means for Your Business
Business
|
SaaS
|
Technology
Mar 18, 2026

Why AI Gross Margins Are So Much Lower Than SaaS and What That Means for Your Business

AUTHOR

James A. Wondrasek James A. Wondrasek
Graphic representation of the topic AI Inference Cost Crisis: When Running AI Costs More Than Building It

Picture this: you’re presenting your AI product’s financial performance to the board. They’ve spent a decade applying the same mental model — 75% gross margins, maybe better. The number on the slide is 52%. The silence that follows isn’t scepticism about your competence. It’s the sound of a financial model colliding with a structural economic reality nobody warned them about.

Every time a user interacts with your AI product — every query, every generation, every agent action — the meter runs. There’s a real compute cost attached to each of those interactions, and SaaS never had that.

ICONIQ Capital‘s January 2026 State of AI report finds inference averages 23% of total revenue at scaling-stage AI B2B companies. Bessemer Venture Partners documents AI gross margins at 50-60%, against 70-90% for mature SaaS businesses. These aren’t outliers from a bad quarter. They’re structural characteristics of the asset class.

This article is about why the economics are different — so you can set the right expectations, price correctly, and walk into that board meeting prepared. For cost reduction strategies, see our AI inference cost crisis overview and full guide to AI inference economics.

Why does running AI cost so much more than traditional software?

Here’s the core difference. Traditional SaaS: once you’ve written the software and provisioned the servers, serving an additional user costs almost nothing. The marginal cost approaches zero at scale. AI inference: every single user query runs the model again — consuming GPU compute, memory bandwidth, and energy. Every. Single. Time.

SaaS is like a printed book. Write it once, distribute at near-zero marginal cost. AI is more like a human expert answering questions. Each answer has a labour cost, and the more questions you field, the higher the bill.

Traditional SaaS COGS scales sub-linearly with users. AI COGS scales directly with usage. As Ben Murray at TheSaaSCFO puts it: “Once the product is built, incremental cost to produce a dollar of revenue is very low. AI companies, on the other hand, sell into software and labour budgets.”

Training is a one-time capital expense. Inference is continuous operational expenditure — every query, every agent action, every API call adds to the tab. Training might cost $100,000 once. Inference scales to $10,000 a month at a million queries.

Why does inference account for 80-90% of total AI lifetime compute costs?

Training happens once. Inference happens every time a user interacts — thousands of times, then millions. That inversion is counterintuitive because training headlines dominate media coverage. GPT-4‘s training costs got extensive coverage. The ongoing inference costs generating revenue around the clock got almost none.

A model is trained once over weeks, then served to users for months or years. Training represents 10-20% of total compute costs over a model’s lifecycle. Inference represents 80-90%.

As your product gains users, training costs stay fixed while inference costs grow linearly. For agentic AI architectures — where a single user action triggers a sequence of model calls — inference costs multiply 5-20x per user action. We dig into this in depth in the article on why AI bills explode between pilot and production.

When 23% of revenue goes to inference: what does the ICONIQ finding actually mean?

ICONIQ’s January 2026 State of AI report finds inference averages 23% of total revenue at scaling-stage AI B2B companies. And this figure doesn’t meaningfully decline as companies grow. In plain P&L terms: for every $1M in AI product revenue, approximately $230,000 is consumed by inference costs before you’ve paid for engineering, sales, or anything else.

To make that concrete:

$1M AI product revenue → ~$230,000 annually in inference costs

$5M AI product revenue (200-person company) → ~$1,150,000 annually

$10M AI product revenue → ~$2,300,000 annually

In traditional SaaS, COGS at scale typically runs 10-25% of revenue. Inference alone at 23% puts AI at the high end of total SaaS COGS before you’ve added anything else. And it rises with scale — inference sits at 20% pre-launch and 23% at scale. As Jason Lemkin frames it: “as you grow, you need ever more inference. You can’t cut it without degrading the product.”

There’s an important nuance here for SaaS companies adding AI features. The ICONIQ benchmark applies to pure AI B2B companies. Only your AI-adjacent revenue should be measured against inference costs — not your total ARR. A $10M ARR SaaS company launching an AI feature tier generating $1M in incremental revenue faces ~$230K in inference costs against that $1M — not against the $10M base. Getting that wrong has real consequences. See how to build AI cost governance without a dedicated FinOps team.

Why token prices are falling but your AI bill keeps growing (Jevons Paradox explained)

Token prices have fallen approximately 1,000x over three years. What cost $60 per million tokens in 2021 now costs around $0.06. And yet enterprise LLM API spending has grown 320% over the same period.

This is Jevons Paradox in action: when a resource becomes significantly cheaper, organisations use substantially more of it, and total consumption rises rather than falls.

When GPT-4 cost $60 per million tokens, companies deployed it carefully. At $3 per million tokens, they expanded to five new use cases. At $0.10 per million tokens, every workflow gets AI. Total tokens multiply faster than the per-token price drops. a16z calls this “LLMflation” — from their OpenRouter analysis, tokens consumed quintupled over the same period that per-token prices dropped to one-third.

There’s also a hardware paradox at work. Even as software-layer token costs fall, AWS raised GPU Capacity Block prices by 15% in January 2026. Physical compute has its own supply constraints — lead times on H100 and H200 clusters exceeding 30 weeks. Agentic workflows multiply token consumption non-linearly, with a single user action triggering 5-20 sequential model calls.

Don’t model falling inference costs as guaranteed budget relief. LLMflation is a signal to build usage governance, not a signal to skip it. Without governance, efficiency gains get absorbed by consumption growth.

The gross margin gap: ~52% AI vs. 70-90% SaaS and what it means for your business

AI companies structurally operate at approximately 50-60% gross margins, compared to 70-90% for mature SaaS companies. This isn’t a temporary inefficiency — it’s an architectural consequence of inference costs. ICONIQ’s January 2026 data shows AI gross margins averaging 52%, up from 41% in 2024, with a ceiling well below SaaS norms.

SaaS at scale runs COGS of roughly 10-25%, yielding gross margins of 75-90%. AI companies run COGS of roughly 40-50% — with inference alone accounting for ~23% — yielding gross margins of 50-60%. The 15-30 percentage point gap cannot be fully recovered through operational efficiency.

AI Shooting Stars — fast-growing, capital-efficient AI startups with strong product-market fit — average approximately 60% gross margins. AI Supernovas — explosively scaling, thin-wrapper products — can sit as low as 25%. AI companies also need higher growth rates to hit Rule of 40 because the profit margin component starts from a lower base. As Murray at TheSaaSCFO frames it: “If SaaS is about margin efficiency, AI is about value density — how much output, productivity, or labour you replace per dollar of cost.”

For SaaS companies adding AI features: those features will compress your margins unless priced to recover inference costs separately. The natural response is in pricing — which we explore in how to design AI product pricing.

What the 6x ARPA requirement means for how you should price your AI product

TheSaaSCFO’s financial modelling finds that to match the EBITDA output of an equivalent SaaS company, an AI company would need to be 6x the revenue size. Here’s the arithmetic:

If your AI product replaces $200,000 of annual labour value, pricing at $20,000 — the SaaS-era reflex — destroys margin. BVP’s research indicates AI companies must price 5-6x SaaS equivalents for comparable unit economics.

Consumption-based pricing is transparent but creates buyer anxiety about unpredictable bills. Outcome-based pricing — per successful result, per resolution — aligns AI costs with delivered value. ICONIQ’s data shows this shift is underway: 37% of AI companies plan to change their pricing model in the next 12 months, and outcome-based pricing jumped from 2% to 18% of AI companies in just six months. Intercom’s Fin AI agent is the archetype — per-ticket-resolution pricing grew to 8-figure ARR by tying revenue directly to the value delivered.

Don’t add AI features at zero incremental cost to existing subscriptions. The detailed pricing framework is covered in how to design AI product pricing that survives variable inference costs.

The practical P&L reality for growing software companies

The numbers add up faster than most companies expect. Unlike SaaS COGS, inference costs grow with AI usage and require active management, not passive provisioning.

Here’s what the tiered reality looks like across company sizes:

Your board benchmarks gross margins against SaaS comps — typically 70-90% for software companies. AI-augmented products will show 50-65%. That gap needs proactive framing. AI gross margins at 52-60% are a structural characteristic of the asset class, documented by ICONIQ (January 2026), Bessemer Venture Partners, and TheSaaSCFO. The margin gap is the cost of the competitive moat — your competitors face the same economics.

Three practical starting points:

  1. Know your inference-to-revenue ratio and compare it to the ICONIQ 23% benchmark. Above it likely means inefficient; below it may mean underinvesting in product capability.

  2. Price AI features with inference costs built in, not absorbed from existing SaaS margins. Even modest per-feature pricing lets you track and manage inference costs against attributable revenue.

  3. Set a governance trigger: when inference approaches 30% of AI product revenue, activate a cost review. The governance framework for this is detailed in how to build AI cost governance without a dedicated FinOps team.

Understanding the structural difference from SaaS economics is the prerequisite for every financial decision you’ll make as you scale an AI product. For a complete overview of the inference cost crisis and how all of these elements fit together, see the full guide to AI inference economics.

Frequently Asked Questions

Is the 52% AI gross margin figure consistent across all types of AI companies?

No — gross margins vary by position in the AI value chain. Infrastructure-layer companies reselling compute (Groq, Together AI) sit as low as 40-50%. Application-layer companies adding proprietary value above raw compute cost (Perplexity at approximately 60%) sit higher. AI Supernovas — explosively scaling, thin-wrapper products — can be as low as 25% or negative. AI Shooting Stars — Bessemer’s cohort of capital-efficient, strong-PMF startups — average approximately 60%.

The ICONIQ 52% is the industry average for scaling-stage AI B2B companies in January 2026, up from 41% in 2024. Individual margins range from 25-85%+ depending on architecture and infrastructure strategy.

Does the Rule of 40 still apply to AI companies?

Yes, but the targets differ. AI companies with structurally lower gross margins need higher growth rates to hit Rule of 40 because the profit margin component starts from a lower base. Some investors are applying gross-margin-adjusted versions of Rule of 40 for AI companies — the benchmark is evolving, but the underlying principle remains valid.

Can AI gross margins improve over time?

Yes, with caveats. ICONIQ data shows improvement: 41% in 2024, 45% in 2025, 52% in 2026. Optimisation paths — model routing, prompt caching, quantisation, infrastructure migration — are covered in our articles on AI inference cost reduction and building AI cost governance. The structural floor remains: AI gross margins will likely improve toward 60-65% but are unlikely to reach the 80%+ that SaaS companies achieve.

What is “LLMflation” and how does it affect my AI costs?

LLMflation (a16z terminology) describes the approximately 10x annual decline in inference costs for equivalent model performance — what cost $60 per million tokens three years ago now costs around $0.06. The paradox: LLMflation makes individual model calls cheaper, but Jevons Paradox means total AI spend rises as companies deploy AI to more use cases. Tokens consumed quintupled while per-token prices dropped to one-third. The per-unit cost drops; the total bill climbs. LLMflation is a signal to build usage governance, not to assume cost savings will materialise automatically.

What is the “inference tax” and how does it differ from traditional COGS?

The inference tax is the recurring compute cost incurred every time an AI product serves a user — analogous to a raw material cost in manufacturing. Traditional SaaS COGS scales sub-linearly — the 10,000th customer costs little more than the 1,000th. Inference COGS scales directly with usage — the 10,000th AI query costs approximately the same as the 1,000th. The governance implication: treat it as a managed variable cost, not fixed overhead.

How does agentic AI change the inference cost calculation?

Agentic AI workflows multiply inference costs 5-20x per user action. A simple chatbot query: 1 model call. An agentic AI researching, planning, and executing the same resolution: 8-15 model calls. Expect inference costs to increase 5-20x when moving from chatbot to agent architectures before any optimisation — which is why outcome-based pricing makes more sense than seat-based pricing for agentic AI products.

Why do AI startups have lower profit margins than regular software companies?

Because AI products have a perpetual raw material cost that traditional software does not. Traditional SaaS: build once, serve millions at near-zero marginal cost. AI SaaS: every user interaction requires running the model — generating real compute cost. Gross margins don’t improve as easily with scale because inference costs scale with usage. AI companies are partly technology companies and partly compute resellers — purchasing GPU inference and reselling it as intelligent capability.

Is the ICONIQ 23% inference benchmark relevant if I’m adding AI to an existing SaaS product rather than building a pure AI company?

Partially — the benchmark applies to the AI-revenue portion of the business, not total ARR. A $10M ARR SaaS company launching an AI feature tier generating $1M in incremental revenue faces ~$230K in inference costs against that $1M — not $2.3M against the full base. If AI features are priced at zero incremental cost, that inference cost is absorbed silently from existing margins. Always price AI features separately so the costs can be tracked and managed.

What’s the difference between training costs and inference costs for AI?

Training is the one-time process of teaching a model its capabilities — it happens once at model creation and occasionally during fine-tuning cycles. Inference is the ongoing process of generating responses to user queries — running continuously for the lifetime of the product, 24/7 in production. The cost ratio: training represents 10-20% of total compute costs over a model’s lifecycle; inference represents 80-90%, because it runs millions of times while training runs once.

How does the AI gross margin gap affect how I should communicate to my board?

Your board benchmarks gross margins against SaaS comps — typically 70-90%. AI-augmented products will show 50-65%. Three elements of the reframe:

  1. Validate externally: AI gross margins at 52-60% are a structural characteristic documented by ICONIQ (January 2026), Bessemer Venture Partners, and TheSaaSCFO. Bring these sources to the board discussion.

  2. Frame the opportunity: the inference cost burden is what enables the AI capability that differentiates your product. Your competitors face the same economics.

  3. Provide a roadmap: model routing, caching, and quantisation have documented paths toward 60-65% gross margins over 12-24 months — this is not a permanent ceiling.

AUTHOR

James A. Wondrasek James A. Wondrasek

SHARE ARTICLE

Share
Copy Link

Related Articles

Need a reliable team to help achieve your software goals?

Drop us a line! We'd love to discuss your project.

Offices Dots
Offices

BUSINESS HOURS

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Sydney

SYDNEY

55 Pyrmont Bridge Road
Pyrmont, NSW, 2009
Australia

55 Pyrmont Bridge Road, Pyrmont, NSW, 2009, Australia

+61 2-8123-0997

Yogyakarta

YOGYAKARTA

Unit A & B
Jl. Prof. Herman Yohanes No.1125, Terban, Gondokusuman, Yogyakarta,
Daerah Istimewa Yogyakarta 55223
Indonesia

Unit A & B Jl. Prof. Herman Yohanes No.1125, Yogyakarta, Daerah Istimewa Yogyakarta 55223, Indonesia

+62 274-4539660
Bandung

BANDUNG

JL. Banda No. 30
Bandung 40115
Indonesia

JL. Banda No. 30, Bandung 40115, Indonesia

+62 858-6514-9577

Subscribe to our newsletter