On 20 January 2025, a Hangzhou-based startup with fewer than 200 employees posted a model to Hugging Face. By end of trading that day, NVIDIA had lost approximately $589 billion in market capitalisation — its largest single-day loss on record. The model was DeepSeek R1. And it kicked off a twelve-month chain of events that has completely changed how you should be thinking about enterprise AI strategy.
In the months that followed, Alibaba’s Qwen overtook Meta’s Llama as the most-forked model family on Hugging Face. Moonshot AI released Kimi K2 to a chorus of “another DeepSeek moment.” And when OpenAI launched GPT-5 in August 2025, Wired’s headline read “So Long, GPT-5. Hello, Qwen.”
This piece maps what happened, why it happened, and what the concrete numbers show. It’s part of a broader look at the open-weight AI model landscape — and it’s worth reading before you renew any AI vendor contracts.
What actually happened in January 2025?
DeepSeek R1 dropped on 20 January 2025 under an MIT licence. It matched GPT-4-class performance on major benchmarks at a disclosed training cost of around $5.5 million, using 2,000 NVIDIA H800 GPUs — compared to comparable Western models requiring $80–100 million and 16,000 H100 GPUs.
That gap is why NVIDIA’s stock moved the way it did. The market had priced in a world where only organisations spending hundreds of millions on GPU clusters could build frontier models — what people call the “compute moat” thesis. A $5.5 million training bill blew a hole in that assumption, publicly and in real time.
The company behind it wasn’t a tech giant. DeepSeek was backed by High-Flyer Capital Management, a quantitative hedge fund, and staffed by recent Chinese university graduates. Within weeks it became the most-liked model on Hugging Face of all time. The term “DeepSeek Moment” entered developer vocabulary as shorthand for an open-weight release that surprises global markets on capability-per-dollar performance.
What is the difference between open-weight and open-source AI models?
This distinction matters more than most people realise. Getting it wrong creates IP and compliance exposure.
Open-weight means the trained model weights are publicly released — anyone can download, run, fine-tune, and redistribute. Open-source, strictly defined, also requires release of training data, training code, and data processing pipelines. Most so-called “open” models — including DeepSeek R1 and Qwen — are open-weight, not open-source. Think of it this way: open-weight gives you the engine; open-source also gives you the factory blueprints and the raw materials list.
The enterprise compliance question comes down to licensing, not the open/closed binary. Three licences cover the main Chinese models:
MIT Licence (DeepSeek R1): Broad commercial use with minimal restrictions.
Apache 2.0 (Qwen): Permissive commercial use with a patent licence grant. The standard enterprise green flag.
Modified MIT Licence (Kimi K2): Standard MIT with an attribution clause that activates only above 100 million monthly active users or $20 million monthly revenue. Effectively permissive for most SMB use cases.
Stanford HAI’s DigiChina brief notes that users found these models appealing because they were free to use and adapt, engineered for low-cost deployment, and good enough for a wide range of use cases. That licensing clarity is a big part of what drove adoption — and the next section explains the engineering side of why it happened so fast.
How did Chinese labs close the gap so quickly?
Three interlocking factors: Mixture-of-Experts (MoE) architecture, Multi-head Latent Attention (MLA), and reinforcement learning post-training. You really only need to understand the first one to get the economics.
MoE models activate only a fraction of their total parameters per inference step. DeepSeek-V3 has 671 billion total parameters but activates only 37 billion during inference. A router network decides which specialist sub-networks fire for each token. The rest sit idle. The result is a model with the capability of something much larger, at the inference cost of something much smaller.
MLA reduced GPU memory demands at training time. Reinforcement learning post-training then improved capability without proportional compute increases.
The structural driver here is US export controls. Restricting access to NVIDIA H100 and A100 GPUs created a hard constraint for Chinese labs. DeepSeek has repeatedly stated that its single biggest constraint is access to AI compute. That constraint pushed algorithmic innovation rather than raw scaling — and produced DeepSeek R1 at $5.5M and Kimi K2 at $4.6M.
INT4 quantisation means very large MoE models can run on commodity hardware. Kimi K2 can reportedly be deployed on two Mac Studio M3 Ultra units. For a deeper dive on MoE’s role in reducing inference costs, the architecture explanation covers the mechanics in full.
What do the ecosystem metrics actually show?
The metric worth watching is derivative model count on Hugging Face. It reflects practitioner decisions at scale — people who have tested models in real workflows, not benchmark labs.
By mid-2025, Qwen had generated over 113,000 derivative models and more than 200,000 repositories tagging Qwen on Hugging Face. Meta’s Llama family had approximately 27,000 derivatives. DeepSeek had around 6,000. Alibaba’s total derivative count is almost as much as both Google and Meta combined. That’s not a marginal difference.
Qwen is the second-most-popular open model on OpenRouter. Wired documented Qwen running in Rokid smart glasses for real-time translation. HSBC, Standard Chartered, and Saudi Aramco now test or deploy DeepSeek models. ByteDance‘s Doubao surpassed 100 million daily active users in December 2025.
Baidu went from near-zero Hugging Face open-weight releases in 2024 to an active presence in 2025. ByteDance and Tencent each made comparable increases. One lab’s breakout became a sector-wide strategy shift.
For context on the broader competitive shift in open-weight AI, the numbers above tell only part of the story.
Why did GPT-5 and Llama 4 underperform expectations?
These two releases matter because they’re the foil to the Chinese lab story. If either had delivered, the enterprise calculus would look different today.
Meta released Llama 4 on April 6, 2025. The benchmark numbers were not what the community expected — Llama 4 Scout scored the same as GPT-4o mini on Artificial Analysis, a 109B parameter model matching a 1.5B model. Meta also launched their benchmark chart without including Gemini 2.5, which scored approximately 20 points higher. The release “landed with a thud” against a backdrop of $65 billion in Meta AI spending.
OpenAI launched GPT-5 in August 2025. Capability gains relative to cost didn’t justify migration for organisations already embedded in existing workflows. OpenAI also released gpt-oss, their first open-weight model since GPT-2, which received less community uptake than the Chinese alternatives.
The practical outcome: developers who’d been waiting for the next big US model started seriously evaluating Qwen, DeepSeek V3, and Kimi K2 instead. Mistral (France) remains the primary non-Chinese option for teams with model provenance requirements. If the origin of your model weights matters for compliance, that question is addressed in the article on risk and compliance for open-weight AI in production.
Can you trust the benchmark scores?
Treat them as a shortlist filter — useful for ruling out clearly underperforming models, not for making final deployment decisions.
The specific problem: SWE-Rebench research found Chinese AI models showed larger score drops on decontaminated tasks compared to Western frontier models, indicating benchmark contamination. Once a benchmark becomes a target, labs optimise for it. Goodhart’s Law, applied to AI. And this isn’t only a Chinese lab problem — almost every model shows some performance drop on decontaminated tasks.
What can you actually use? METR time-horizon benchmarks measure how long a model can sustain a coherent agentic task — a proxy for real-world utility rather than narrow problem-solving. Simon Willison’s year-in-LLMs analysis documents the open model leaderboard at end of 2025 on METR time-horizons: GLM-4.7 (68), Kimi K2 Thinking (67), MiMo-V2-Flash (66), DeepSeek V3.2 (66), MiniMax-M2.1 (64), gpt-oss-120B (61). GLM-4.7, from Zhipu AI, leads that table.
Use published benchmarks to eliminate the bottom of the field. Use LM Arena (human preference rankings) as a second filter. Then test shortlisted models on representative samples of your actual workload before committing. That’s a sensible three-step process and it’s not complicated.
What does this mean for enterprise AI strategy?
There are now three ways to deploy AI in your product or workflows: proprietary APIs, self-hosted open-weight models, and managed open-weight services. Each has a different profile on cost, control, and infrastructure overhead.
Proprietary APIs (OpenAI, Anthropic, Google): lowest setup cost, highest per-token cost, full vendor dependency.
Self-hosted open-weight: highest setup cost, lowest ongoing cost at scale. Generating equivalent token volumes on DeepSeek costs orders of magnitude less than GPT-4.1. But you need MLOps capacity to run it.
Managed open-weight services sit in the middle. AWS Bedrock now offers 18 fully managed open-weight models, including Qwen and MiniMax M2 — no infrastructure management, lower per-token cost than proprietary APIs, weights on non-Chinese cloud infrastructure. If your team doesn’t have dedicated MLOps capacity, this is the lowest-friction way to run a proper evaluation. For operational specifics, AWS Bedrock’s fully managed open-weight offering covers the model selection and deployment detail.
On licensing: Apache 2.0 (Qwen) and MIT (DeepSeek R1, DeepSeek V3) remove most legal blockers for enterprise adoption. The Modified MIT on Kimi K2 only needs review if you’re projecting significant scale. If you’re in a regulated industry, data residency questions around Chinese-origin models remain open — the compliance and trust article covers those.
The strategic recommendation is simple: run a structured evaluation before you renew contracts or expand usage. For a decision framework covering all three deployment paths, the build-vs-buy decision framework maps all three options against your workload type — it’s the next read in this series, as part of the complete open-weight AI model landscape coverage.
FAQ
What is the DeepSeek Moment?
DeepSeek R1’s release on 20 January 2025 demonstrated frontier-class AI capability at a training cost of approximately $5.5 million, triggering a single-day ~$589 billion NVIDIA market cap loss and forcing a reassessment of the “compute moat” thesis. The term has since become shorthand for any open-weight release that surprises global markets on capability-per-dollar performance. DeepSeek was backed by High-Flyer Capital Management and staffed by recent Chinese university graduates.
Is Qwen better than GPT-5?
On specific benchmarks — SWE-Bench, code generation, multilingual tasks — Qwen models are competitive with or ahead of GPT-5. On METR time-horizons, gpt-oss-120B scores 61 while GLM-4.7 and Kimi K2 score 68 and 67 respectively. GPT-5 retains some advantages in instruction-following for certain use cases. It depends on your workload — test both on representative samples before deciding.
Are Chinese AI models safe to use in enterprise applications?
From a licensing perspective, Apache 2.0 (Qwen) and MIT (DeepSeek) permit commercial enterprise use. From a data-security perspective, self-hosted or AWS Bedrock deployments avoid sending data to Chinese servers entirely. Organisations in regulated industries face additional compliance questions around data residency and supply-chain risk — those are covered in the governance and compliance article.
Why did NVIDIA stock drop when DeepSeek was released?
Investors interpreted DeepSeek R1’s $5.5 million training cost as evidence that frontier AI might require far fewer GPUs than assumed, threatening NVIDIA’s data centre revenue thesis. It was a reassessment of assumptions, not a confirmed structural shift — NVIDIA became the world’s first $5 trillion market cap company, with Jensen Huang citing more than $500 billion in orders through 2026.
What is Mixture-of-Experts architecture and why does it matter for cost?
MoE splits a model’s parameters into specialist sub-networks and activates only a fraction per token. DeepSeek-V3 activates 37 billion of 671 billion total parameters per inference step, delivering GPT-4 level output at a fraction of dense model compute cost. Kimi K2 operates similarly — large total parameter count, small active parameter count per inference step. The full architecture explanation covers why this matters for self-hosting feasibility.
Why does Qwen dominate Hugging Face?
Apache 2.0 licensing, strong benchmark performance across multilingual and coding tasks, and regular model family updates have made Qwen the default base model for fine-tuning work. Over 113,000 derivative models use Qwen as a base, with 200,000+ repositories tagging Qwen — compared to approximately 27,000 Llama derivatives and 6,000 DeepSeek derivatives.
What happened to Meta’s Llama 4 in April 2025?
Llama 4 was released April 6, 2025 and underperformed against expectations on LM Arena and Artificial Analysis benchmarks. Meta launched their benchmark comparison without including Gemini 2.5, which scored approximately 20 points higher than Llama 4. The release redirected developer attention toward Qwen and DeepSeek V3.
What is Kimi K2 and why is it called “another DeepSeek moment”?
The open-sourcing of Kimi K2 was widely described as “another DeepSeek moment” because it demonstrated long-horizon agentic capability — up to 200–300 sequential tool calls — at a training cost of $4.6M that again undercut US frontier model economics. Kimi K2 is released by Moonshot AI under a Modified MIT Licence permissive for most commercial use cases.
What is the difference between self-hosted and managed open-weight deployment?
Self-hosted means running model weights on your own or rented infrastructure — full control, highest setup cost, full data control. Managed open-weight, for example AWS Bedrock’s fully managed offering, means a cloud provider hosts and serves the weights for you. No infrastructure management, lower per-token cost than proprietary APIs, no data sent to Chinese servers. The enterprise deployment options article covers the operational specifics.
What Apache 2.0 licensed Chinese models can I use commercially?
Qwen (Apache 2.0) is the most widely adopted. DeepSeek R1 and DeepSeek V3 use MIT, which also permits unrestricted commercial use. Kimi K2 uses a Modified MIT that permits commercial use with an attribution clause activating only above 100 million monthly active users or $20 million monthly revenue. Always verify the current licence version before deploying — licences can change between model versions.
How much does it actually cost to run a Chinese open-weight model in production?
DeepSeek V3 API pricing (after a 50% price cut in September 2025): $0.028 per million input tokens (cache hit) and $0.42 per million output tokens. Generating equivalent volume on GPT-4.1 can cost orders of magnitude more. The full cost comparison — including self-hosted and managed service options — is in the build-vs-buy decision framework.
Who is actually using Chinese open-weight models in production today?
Documented adopters include Rokid (Qwen in smart glasses for real-time translation), HSBC, Standard Chartered, and Saudi Aramco testing or deploying DeepSeek models, and ByteDance’s Doubao at 100 million daily active users. HSBC and Standard Chartered are worth noting if you’re in financial services — regulated-industry peers are already moving.