The Gallery
Acq. 2025·Industry

DeepSeek R1

A Chinese AI lab trained a frontier-grade reasoning model for roughly $6 million, shattering the assumption that world-class AI required hundreds of millions in compute.

Overview

DeepSeek R1 was released by DeepSeek, a research lab affiliated with the Chinese quantitative hedge fund High-Flyer, in January 2025. The model was designed specifically for complex reasoning tasks — mathematics, coding, and logical inference — and was trained using a reinforcement learning pipeline that rewarded verifiable correct answers rather than relying exclusively on supervised fine-tuning from human-labeled data. Its release was accompanied by a detailed technical report and, crucially, open weights, making it immediately accessible to researchers and practitioners worldwide.

The training methodology behind R1 was notably lean. DeepSeek reported training costs on the order of approximately $5–6 million USD, a figure that stood in stark contrast to estimates of hundreds of millions spent on comparable frontier models from OpenAI, Google, and Anthropic. The model achieved this efficiency in part through a Mixture-of-Experts (MoE) architecture in its underlying base model, DeepSeek-V3, which activates only a fraction of total parameters for any given input. R1 itself was then produced via a reinforcement learning stage on top of that base, with Group Relative Policy Optimization (GRPO) replacing the more resource-intensive PPO algorithm typically used in RLHF pipelines.

On public benchmarks, R1 matched or exceeded OpenAI's o1 model on several standard reasoning evaluations, including AIME 2024 (mathematics olympiad problems) and Codeforces competitive programming tasks. The release triggered immediate and significant market reactions: Nvidia's stock dropped sharply as investors reassessed how much compute future AI development would actually require. Within days, R1 had become one of the most downloaded models on Hugging Face, and its distilled variants — smaller models trained to mimic R1's reasoning chains — were being integrated into applications globally.

Key Facts

  • DeepSeek reported training DeepSeek-V3 (the base model underlying R1) for approximately $5.576 million USD in compute costs.
  • R1 scored 79.8% on AIME 2024, matching OpenAI o1's reported performance on the same benchmark.
  • The underlying DeepSeek-V3 base model has 671 billion total parameters but activates only 37 billion parameters per forward pass via its Mixture-of-Experts architecture.
  • R1 was released on January 20, 2025, with full model weights publicly available under a permissive license on Hugging Face.
  • Nvidia lost approximately $600 billion in market capitalization in a single trading session on January 27, 2025, partly attributed to investor reassessment of GPU demand following R1's release.
Why It Matters

DeepSeek R1 fundamentally disrupted the prevailing economic narrative of the AI industry. The dominant assumption entering 2025 was that the gap between frontier AI labs and all other actors would widen indefinitely because only a handful of organizations could afford the compute required to train competitive models. R1 demonstrated that algorithmic innovation — specifically in reinforcement learning efficiency and sparse activation architectures — could substitute for raw spending in ways that most analysts had not anticipated. This shifted the conversation from 'who can buy the most GPUs' toward 'who can engineer the most efficient training pipelines.'

The open-weight release of R1 compounded its impact beyond the single model itself. By publishing both the weights and the technical report, DeepSeek accelerated the diffusion of its training techniques across the global research community. The GRPO algorithm and the chain-of-thought reinforcement learning approach were rapidly studied, reproduced, and extended by independent researchers. R1 thus functions as a methodological inflection point: it is the moment when efficient reasoning-oriented RL training became a credible, documented, replicable alternative to brute-force scaling.

The People
DeepSeek-AI TeamDaya GuoDejian YangQihao ZhuJunxiao SongRunxin Xu
Sources
[1]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z.F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, et al. · 2025

https://arxiv.org/abs/2501.12948

[2]

DeepSeek-V3 Technical Report

DeepSeek-AI · 2024

https://arxiv.org/abs/2412.19437