Business

SaaS

Technology

•

Apr 26, 2026

Open-Source AI Models in 2025 and Beyond — What DeepSeek, Qwen, and the New Wave Mean for Enterprise Strategy

Q: Can I run Qwen or DeepSeek without sending data to China?

Yes. Self-hosted deployment keeps inference data on your own infrastructure. AWS Bedrock keeps it within your chosen AWS region. The scenario you need to avoid is using Alibaba Cloud or the DeepSeek API directly — those route your data through their servers. For regulated industries or anything touching sensitive data, self-hosted or Bedrock-managed deployment is the right call.

On 27 January 2025, DeepSeek wiped $589 billion from NVIDIA‘s market cap with a model trained for $5.6 million. Open-weight models from DeepSeek, Qwen, and Kimi K2 now match proprietary frontier systems at a fraction of the API cost — and Qwen has surpassed Meta’s Llama as the most downloaded LLM family on Hugging Face. This guide links to detailed coverage of every decision your organisation needs to make.

In this guide: DeepSeek and NVIDIA · How Chinese labs caught up · MoE architecture · Deployment paths · Coding agents · AWS Bedrock · Governance · Autonomous agents

What happened with DeepSeek in January 2025 and why did it cause NVIDIA’s stock to drop?

DeepSeek R1 matched GPT-4-class performance while costing an estimated $5.6 million to train — roughly 5% of what comparable Western models cost. That single data point collapsed the prevailing assumption that frontier AI required billions in GPU spend. If you could get there for five million, the rationale for buying hundreds of millions worth of NVIDIA hardware fell apart overnight. The market reached that conclusion fast. NVIDIA shed $589 billion in a single session.

Explore in depth: How Chinese open-weight labs overtook US proprietary models

How did Chinese labs close the gap with US proprietary models so quickly?

Four labs — DeepSeek, Alibaba (Qwen), Moonshot AI (Kimi K2), and Z.AI (GLM) — closed the gap through architectural innovation, not raw compute. That distinction matters. They weren’t just throwing more hardware at the problem. Mixture-of-Experts (MoE) activates only a fraction of parameters per inference, cutting costs dramatically at scale. US export controls, which were meant to slow them down, actually pushed these labs toward efficiency-first engineering. Being forced to do more with less turned out to be a competitive advantage.

Explore in depth: How Chinese open-weight labs overtook US proprietary models

What is Mixture of Experts (MoE) architecture and why does it matter for AI costs?

Here’s the thing about model size: it’s a misleading number. MoE routes each token to only the relevant “expert” sub-networks, activating a fraction of total parameters per inference. DeepSeek-V3 has 671 billion parameters but activates only 37 billion per inference. So when someone quotes you a headline parameter count, it tells you almost nothing about what it will cost you to run. MoE models operate far more affordably than their size suggests, and that changes the economics of self-hosted deployment quite significantly.

Explore in depth: What Mixture-of-Experts means for deployment cost

Should you use proprietary APIs, self-hosted open-weight models, or AWS Bedrock?

There are three paths and only three paths worth seriously considering. Proprietary API gets you zero infrastructure overhead and immediate access to the frontier, but you’re paying full rack rate and you’re locked into that vendor’s roadmap. Self-hosted open-weight gives you maximum control and the lowest marginal cost at volume, but someone on your team is owning the ML-ops burden — and that is not a trivial commitment. Managed open-weight via AWS Bedrock sits in the middle: enterprise SLAs, no GPU provisioning, no ML-ops headaches, and you keep your data inside AWS. Which path is right for you comes down to your workload type, your team’s actual capability, your compliance requirements, and the volume of inference you’re running.

Explore in depth: Build vs buy decision framework for open-weight AI

Which open-weight model is best for coding agents?

Three models lead for coding workloads and the differences between them are practical ones, not just benchmark positions. GLM-4.7 holds the highest SWE-Bench Verified score at 74.2, but benchmark leadership doesn’t always translate to the easiest deployment story. Qwen3-Coder-Next (70.6) is the most deployable option right now — it runs locally at 46 GB in 4-bit quantisation and integrates directly with Claude Code via an OpenAI-compatible API, which is worth a lot in a real engineering workflow. DeepSeek-V3.2 (70.2) is your pick if you’re already on Bedrock, Azure, or Google Cloud and want managed inference without touching local hardware. Pick based on where your infrastructure already lives.

Explore in depth: Which open-weight model wins for coding agents

How do you run open-weight models on AWS Bedrock without infrastructure overhead?

AWS Bedrock’s December 2025 expansion is worth paying attention to if you’ve been on the fence about managed open-weight deployment. Eighteen fully managed open-weight models — Qwen3 variants, Kimi K2, MiniMax M2, and others — are now available on a serverless platform with enterprise SLAs and data residency controls. No GPU provisioning. No cluster management. You select a model, configure your guardrails, and call the API. For teams that want access to Chinese-lab model capability without the infrastructure overhead or the data sovereignty questions that come with using those labs’ own APIs, this is the straightforward answer.

Explore in depth: Running open-weight models on AWS Bedrock

What governance and compliance requirements apply to open-weight AI in 2026?

Three frameworks are active and your legal team needs to know about all of them. The EU AI Act is the big one with the broadest reach. Colorado SB24-205 adds bias testing and impact assessment obligations for certain automated decision systems. US federal LLM procurement requirements mandate model cards and evaluation artefacts. Chinese-lab models — which is most of the open-weight field right now — create distinct provenance risks on top of the standard compliance obligations. And here’s the thing people miss: Apache 2.0 licensing permits commercial use, but it does absolutely nothing to satisfy your regulatory documentation obligations. The licence and the compliance burden are completely separate questions.

Explore in depth: Governing open-weight AI — risk, compliance, operating model

When does it make sense to run AI agents autonomously?

The honest answer is: it depends on where the cost of supervision sits relative to the value of what the agent is doing. Autonomous agents become viable when the complexity of a task genuinely exceeds what it costs to have a human check each step. That tipping point — the “delegation threshold” — used to be quite high. Predictable managed inference has pushed it into practical territory for coding and data engineering workloads. The infrastructure got reliable enough that the supervision cost became the bigger drag. That’s the shift that matters for how you plan your AI investment.

Explore in depth: The delegation threshold — when AI agents run unsupervised

Open-Weight AI Decision Library

Landscape and Architecture

How Chinese open-weight labs overtook US proprietary models: The competitive shift — DeepSeek, Qwen, Kimi K2
Mixture-of-Experts architecture explained: Sparse activation and deployment economics

Deployment Decisions

Build vs buy open-weight AI: Three-option framework with workload mapping and TCO analysis
Which open-weight model wins for coding agents: Qwen3-Coder-Next vs DeepSeek-V3.2 vs GLM-4.7
Running open-weight models on AWS Bedrock: Model selection for Bedrock’s 18 managed options

Governance and Strategy

Governing open-weight AI in production: EU AI Act, Colorado SB24-205, provenance risk
The delegation threshold: Conversation vs delegation mode and AI investment planning

Frequently Asked Questions

What are open-weight AI models and how are they different from open-source AI models?

Open-weight models make their trained parameters freely downloadable — you can run them yourself or fine-tune them for your own use case. Open-source models go further and also release the training code and data, which is a genuinely rarer standard. DeepSeek, Qwen, and Kimi K2 are open-weight. That’s an important distinction: open-weight gives you deployment flexibility and avoids vendor API dependency, but it does not mean open governance, and it does not mean the model’s training decisions are transparent or auditable. Don’t conflate the two.

See How Chinese open-weight labs overtook US proprietary models.

How much cheaper are open-weight models to run compared to GPT-4 or Claude?

Qwen 2.5-Max costs approximately $0.38 per million tokens — that’s significantly below the major Western API pricing. If you go self-hosted, you push the marginal cost lower still, but you’re amortising real hardware and engineering overhead against that saving. The maths works out differently depending on your volume. At low throughput, proprietary APIs win on simplicity. At high volume, self-hosted starts making serious economic sense.

See the Build vs buy decision framework for TCO analysis.

Is DeepSeek safe to use for my company’s data?

It depends entirely on how you deploy it. If you run DeepSeek weights on your own infrastructure or through AWS Bedrock, your data never reaches DeepSeek’s servers. That’s the key distinction. The Chinese lab origin does create legitimate provenance questions — particularly for regulated industries — and those obligations sit on top of, not instead of, your standard data governance requirements.

See Governing open-weight AI in production.

Can I run Qwen or DeepSeek without sending data to China?

Yes, and it’s not complicated. Self-hosted deployment keeps inference data on your own infrastructure. AWS Bedrock keeps it within your chosen AWS region. The scenario you need to avoid is using Alibaba Cloud or the DeepSeek API directly — those route your data through their servers. For regulated industries or anything touching sensitive data, self-hosted or Bedrock-managed deployment is the right call.

See Running open-weight models on AWS Bedrock.

What open-weight AI models can actually compete with ChatGPT in 2025?

DeepSeek-R1 and Qwen3 235B A22B match top Western frontier models on MMLU and reasoning benchmarks. Kimi K2 Thinking ranks first among open-weight models not from OpenAI, Google, or Anthropic on composite indices. That said, benchmark parity is not the same as task equivalence — leaderboard performance does not predict how a model will behave on your specific workload. Always evaluate on your own data before committing.

See Which open-weight model wins for coding agents.

How do I evaluate whether an open-weight model is good enough for my RAG pipeline?

Don’t rely on leaderboard benchmarks — they don’t predict RAG performance reliably. Run your evaluation on a representative sample of your own retrieval data. The factors that actually matter: context window size, instruction-following accuracy at your expected concurrency, latency under load, and structured output support. Get those right on your data and you’ve got a meaningful answer. Get them right on someone else’s benchmark and you’ve got a number.

See the Build vs buy decision framework.

Open-Source AI Models in 2025 and Beyond — What DeepSeek, Qwen, and the New Wave Mean for Enterprise Strategy

What happened with DeepSeek in January 2025 and why did it cause NVIDIA’s stock to drop?

How did Chinese labs close the gap with US proprietary models so quickly?

What is Mixture of Experts (MoE) architecture and why does it matter for AI costs?

Should you use proprietary APIs, self-hosted open-weight models, or AWS Bedrock?

Which open-weight model is best for coding agents?

How do you run open-weight models on AWS Bedrock without infrastructure overhead?

What governance and compliance requirements apply to open-weight AI in 2026?

When does it make sense to run AI agents autonomously?

Open-Weight AI Decision Library

Landscape and Architecture

Deployment Decisions

Governance and Strategy

Frequently Asked Questions

What are open-weight AI models and how are they different from open-source AI models?

How much cheaper are open-weight models to run compared to GPT-4 or Claude?

Is DeepSeek safe to use for my company’s data?

Can I run Qwen or DeepSeek without sending data to China?

What open-weight AI models can actually compete with ChatGPT in 2025?

How do I evaluate whether an open-weight model is good enough for my RAG pipeline?

Related Articles

Getting Resource Management Right In Active Projects

MCP Apps Are Making News. Do you need one?

Prioritise Success: Set & Track KPIs for Web & App Projects

Need a reliable team to help achieve your software goals?

BUSINESS HOURS

SYDNEY

YOGYAKARTA

BANDUNG