Insights Business| SaaS| Technology Open-Source AI Models in 2025 and Beyond — What DeepSeek, Qwen, and the New Wave Mean for Enterprise Strategy
Business
|
SaaS
|
Technology
Apr 26, 2026

Open-Source AI Models in 2025 and Beyond — What DeepSeek, Qwen, and the New Wave Mean for Enterprise Strategy

AUTHOR

James A. Wondrasek James A. Wondrasek
Comprehensive guide to the open-source AI model race — DeepSeek, Qwen, and enterprise strategy

On 27 January 2025, DeepSeek wiped $589 billion from NVIDIA‘s market cap with a model trained for $5.6 million. Open-weight models from DeepSeek, Qwen, and Kimi K2 now match proprietary frontier systems at a fraction of the API cost — and Qwen has surpassed Meta’s Llama as the most downloaded LLM family on Hugging Face. This guide links to detailed coverage of every decision your organisation needs to make.

In this guide: DeepSeek and NVIDIA · How Chinese labs caught up · MoE architecture · Deployment paths · Coding agents · AWS Bedrock · Governance · Autonomous agents


What happened with DeepSeek in January 2025 and why did it cause NVIDIA’s stock to drop?

DeepSeek R1 matched GPT-4-class performance while costing an estimated $5.6 million to train — roughly 5% of what comparable Western models cost. That single data point collapsed the prevailing assumption that frontier AI required billions in GPU spend. If you could get there for five million, the rationale for buying hundreds of millions worth of NVIDIA hardware fell apart overnight. The market reached that conclusion fast. NVIDIA shed $589 billion in a single session.

Explore in depth: How Chinese open-weight labs overtook US proprietary models


How did Chinese labs close the gap with US proprietary models so quickly?

Four labs — DeepSeek, Alibaba (Qwen), Moonshot AI (Kimi K2), and Z.AI (GLM) — closed the gap through architectural innovation, not raw compute. That distinction matters. They weren’t just throwing more hardware at the problem. Mixture-of-Experts (MoE) activates only a fraction of parameters per inference, cutting costs dramatically at scale. US export controls, which were meant to slow them down, actually pushed these labs toward efficiency-first engineering. Being forced to do more with less turned out to be a competitive advantage.

Explore in depth: How Chinese open-weight labs overtook US proprietary models


What is Mixture of Experts (MoE) architecture and why does it matter for AI costs?

Here’s the thing about model size: it’s a misleading number. MoE routes each token to only the relevant “expert” sub-networks, activating a fraction of total parameters per inference. DeepSeek-V3 has 671 billion parameters but activates only 37 billion per inference. So when someone quotes you a headline parameter count, it tells you almost nothing about what it will cost you to run. MoE models operate far more affordably than their size suggests, and that changes the economics of self-hosted deployment quite significantly.

Explore in depth: What Mixture-of-Experts means for deployment cost


Should you use proprietary APIs, self-hosted open-weight models, or AWS Bedrock?

There are three paths and only three paths worth seriously considering. Proprietary API gets you zero infrastructure overhead and immediate access to the frontier, but you’re paying full rack rate and you’re locked into that vendor’s roadmap. Self-hosted open-weight gives you maximum control and the lowest marginal cost at volume, but someone on your team is owning the ML-ops burden — and that is not a trivial commitment. Managed open-weight via AWS Bedrock sits in the middle: enterprise SLAs, no GPU provisioning, no ML-ops headaches, and you keep your data inside AWS. Which path is right for you comes down to your workload type, your team’s actual capability, your compliance requirements, and the volume of inference you’re running.

Explore in depth: Build vs buy decision framework for open-weight AI


Which open-weight model is best for coding agents?

Three models lead for coding workloads and the differences between them are practical ones, not just benchmark positions. GLM-4.7 holds the highest SWE-Bench Verified score at 74.2, but benchmark leadership doesn’t always translate to the easiest deployment story. Qwen3-Coder-Next (70.6) is the most deployable option right now — it runs locally at 46 GB in 4-bit quantisation and integrates directly with Claude Code via an OpenAI-compatible API, which is worth a lot in a real engineering workflow. DeepSeek-V3.2 (70.2) is your pick if you’re already on Bedrock, Azure, or Google Cloud and want managed inference without touching local hardware. Pick based on where your infrastructure already lives.

Explore in depth: Which open-weight model wins for coding agents


How do you run open-weight models on AWS Bedrock without infrastructure overhead?

AWS Bedrock’s December 2025 expansion is worth paying attention to if you’ve been on the fence about managed open-weight deployment. Eighteen fully managed open-weight models — Qwen3 variants, Kimi K2, MiniMax M2, and others — are now available on a serverless platform with enterprise SLAs and data residency controls. No GPU provisioning. No cluster management. You select a model, configure your guardrails, and call the API. For teams that want access to Chinese-lab model capability without the infrastructure overhead or the data sovereignty questions that come with using those labs’ own APIs, this is the straightforward answer.

Explore in depth: Running open-weight models on AWS Bedrock


What governance and compliance requirements apply to open-weight AI in 2026?

Three frameworks are active and your legal team needs to know about all of them. The EU AI Act is the big one with the broadest reach. Colorado SB24-205 adds bias testing and impact assessment obligations for certain automated decision systems. US federal LLM procurement requirements mandate model cards and evaluation artefacts. Chinese-lab models — which is most of the open-weight field right now — create distinct provenance risks on top of the standard compliance obligations. And here’s the thing people miss: Apache 2.0 licensing permits commercial use, but it does absolutely nothing to satisfy your regulatory documentation obligations. The licence and the compliance burden are completely separate questions.

Explore in depth: Governing open-weight AI — risk, compliance, operating model


When does it make sense to run AI agents autonomously?

The honest answer is: it depends on where the cost of supervision sits relative to the value of what the agent is doing. Autonomous agents become viable when the complexity of a task genuinely exceeds what it costs to have a human check each step. That tipping point — the “delegation threshold” — used to be quite high. Predictable managed inference has pushed it into practical territory for coding and data engineering workloads. The infrastructure got reliable enough that the supervision cost became the bigger drag. That’s the shift that matters for how you plan your AI investment.

Explore in depth: The delegation threshold — when AI agents run unsupervised


Open-Weight AI Decision Library

Landscape and Architecture

Deployment Decisions

Governance and Strategy


Frequently Asked Questions

What are open-weight AI models and how are they different from open-source AI models?

Open-weight models make their trained parameters freely downloadable — you can run them yourself or fine-tune them for your own use case. Open-source models go further and also release the training code and data, which is a genuinely rarer standard. DeepSeek, Qwen, and Kimi K2 are open-weight. That’s an important distinction: open-weight gives you deployment flexibility and avoids vendor API dependency, but it does not mean open governance, and it does not mean the model’s training decisions are transparent or auditable. Don’t conflate the two.

See How Chinese open-weight labs overtook US proprietary models.

How much cheaper are open-weight models to run compared to GPT-4 or Claude?

Qwen 2.5-Max costs approximately $0.38 per million tokens — that’s significantly below the major Western API pricing. If you go self-hosted, you push the marginal cost lower still, but you’re amortising real hardware and engineering overhead against that saving. The maths works out differently depending on your volume. At low throughput, proprietary APIs win on simplicity. At high volume, self-hosted starts making serious economic sense.

See the Build vs buy decision framework for TCO analysis.

Is DeepSeek safe to use for my company’s data?

It depends entirely on how you deploy it. If you run DeepSeek weights on your own infrastructure or through AWS Bedrock, your data never reaches DeepSeek’s servers. That’s the key distinction. The Chinese lab origin does create legitimate provenance questions — particularly for regulated industries — and those obligations sit on top of, not instead of, your standard data governance requirements.

See Governing open-weight AI in production.

Can I run Qwen or DeepSeek without sending data to China?

Yes, and it’s not complicated. Self-hosted deployment keeps inference data on your own infrastructure. AWS Bedrock keeps it within your chosen AWS region. The scenario you need to avoid is using Alibaba Cloud or the DeepSeek API directly — those route your data through their servers. For regulated industries or anything touching sensitive data, self-hosted or Bedrock-managed deployment is the right call.

See Running open-weight models on AWS Bedrock.

What open-weight AI models can actually compete with ChatGPT in 2025?

DeepSeek-R1 and Qwen3 235B A22B match top Western frontier models on MMLU and reasoning benchmarks. Kimi K2 Thinking ranks first among open-weight models not from OpenAI, Google, or Anthropic on composite indices. That said, benchmark parity is not the same as task equivalence — leaderboard performance does not predict how a model will behave on your specific workload. Always evaluate on your own data before committing.

See Which open-weight model wins for coding agents.

How do I evaluate whether an open-weight model is good enough for my RAG pipeline?

Don’t rely on leaderboard benchmarks — they don’t predict RAG performance reliably. Run your evaluation on a representative sample of your own retrieval data. The factors that actually matter: context window size, instruction-following accuracy at your expected concurrency, latency under load, and structured output support. Get those right on your data and you’ve got a meaningful answer. Get them right on someone else’s benchmark and you’ve got a number.

See the Build vs buy decision framework.

AUTHOR

James A. Wondrasek James A. Wondrasek

SHARE ARTICLE

Share
Copy Link

Related Articles

Need a reliable team to help achieve your software goals?

Drop us a line! We'd love to discuss your project.

Offices Dots
Offices

BUSINESS HOURS

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Sydney

SYDNEY

55 Pyrmont Bridge Road
Pyrmont, NSW, 2009
Australia

55 Pyrmont Bridge Road, Pyrmont, NSW, 2009, Australia

+61 2-8123-0997

Yogyakarta

YOGYAKARTA

Unit A & B
Jl. Prof. Herman Yohanes No.1125, Terban, Gondokusuman, Yogyakarta,
Daerah Istimewa Yogyakarta 55223
Indonesia

Unit A & B Jl. Prof. Herman Yohanes No.1125, Yogyakarta, Daerah Istimewa Yogyakarta 55223, Indonesia

+62 274-4539660
Bandung

BANDUNG

JL. Banda No. 30
Bandung 40115
Indonesia

JL. Banda No. 30, Bandung 40115, Indonesia

+62 858-6514-9577

Subscribe to our newsletter