DeepSeek R1 Story: Separating Fact from Fiction in 2025's AI Story

A comprehensive and myth-busting analysis of the origins, technical breakthroughs, media storm, and controversies surrounding the breakthrough Chinese AI model

Feb 17, 2025

DeepSeek R1’s burst onto the scene in January 2025 and triggered waves of curiosity, surprise, and even some panic in the AI landscape

When DeepSeek R1 burst onto the scene last month, it triggered a wave of curiosity, surprise, and even some panic in the AI landscape. The release of this reasoning-focused model from a relatively unknown Chinese AI lab sent NVIDIA's stock into freefall and left Silicon Valley scrambling to make sense of its implications.

As I've delved into this story over the past few weeks, I've found myself swimming in a sea of technical papers, conflicting media reports, and heated debates about what R1 means for the future of AI. After poring over nearly 50 different sources — from academic papers to insider interviews — I'm struck by how much misunderstanding and misreporting surrounds this release.

In this article, we'll cut through the noise to understand what R1 really is, how it works, and address some of the controversies surrounding its development and deployment.

Whether you're a tech enthusiast, an industry professional, or simply curious about how a Chinese AI lab managed to shake up the global tech landscape, I hope you'll find this deep dive as fascinating as I have. Let's begin by understanding exactly what makes DeepSeek R1 so special.

1) What the heck is DeepSeek R1?

DeepSeek V3: The Open-Source AI Revolution

On 20 January 2025, a relatively unknown Chinese AI lab shook the tech world with a new release, DeepSeek R1, that appeared to send NVIDIA's stock plummeting and Silicon Valley scrambling.

DeepSeek R1 is a reasoning model, a state-of-the-art Large Language Model (LLM) that doesn't just process and generate text, but instead systematically “reasons” through problems using a technique known as Chain-of-Thought (CoT) reasoning.

Let me show you what this means in practice. Imagine you're tackling a complex math problem. Instead of jumping straight to the answer, you'd write out your thinking:

First, I'll need to isolate the variables
Then, I'll combine like terms
Finally, I'll solve for x and y

That's exactly how R1 approaches problems — by breaking them down into logical steps and working through them systematically. This approach has helped overcome one of the major hurdles that plagued early AI models: their tendency to struggle with even basic logic and mathematics problems.

R1 isn't alone in this space, mind you. OpenAI opened up this space with the preview version of its o1 model in September 2024, then following up with the premium o1-Pro (exclusive to their £200/month ChatGPT Pro Plan) and the recently released o3-mini. Other contenders include Google's Gemini 2.0 Flash Thinking model and Kimi 1.5, which arrived around the same time as DeepSeek R1.

Beyond the main DeepSeek R1 model that is available free to use on DeepSeek’s website, the model’s developer has also released smaller versions that were created using a process known as distillation. Think of distillation as creating a concentrated essence of the original model — these smaller variants maintain much of R1's reasoning prowess while requiring far less computational power. In fact, some versions of R1 are compact enough to run on a standard laptop. (Fancy running your own AI assistant? Check out my step-by-step guide in this article!)

DeepSeek R1’s reasoning process is fully transparent to users, which is something many find quite endearing. Image Credits: DeepSeek web app

2) Why is everyone talking about DeepSeek R1?

The release of DeepSeek R1 coincided with a significant rout of the US stock market, heavily impacting AI-led equities such as NVIDIA, the seemingly unstoppable juggernaut of AI chips, which saw its stock price fall by ~17% and the company losing nearly US$600 billion in market value in one day.

While it would be oversimplifying to pin these movements entirely on DeepSeek's shoulders, it's clear that something significant had shifted in the AI landscape.

The buzz around R1 isn't primarily about its performance — after all OpenAI and Google have already had comparable reasoning models in play for some months. Instead, the cause has been the perfect storm of four factors: the model’s cost efficiency, its open-source nature, the geopolitical implications thereof, as well as the data security and privacy issues surrounding its chatbot.

Let’s explore each of these in turn.

The Cost Revolution

DeepSeek’s main achievement is that they have delivered performance comparable to OpenAI's o1 at a fraction of the cost. We're talking about training costs estimated at just 3-10% of o1's, and — perhaps more importantly — inference costs (the cost of actually using the model) that are 90-95% lower (more on these estimates below).

While the media has fixated on training costs, it's worth noting that inference costs are the real story here. Think of it this way: training is like building a factory —it's a one-time expense. Inference is like running that factory — it's an ongoing cost that really determines long-term viability.

What makes this cost advantage particularly impressive is that DeepSeek achieved it while operating at a significant disadvantage. Due to US semiconductor export restrictions, the company faces what they call a "4x compute disadvantage" compared to their American counterparts. All things being equal, they would have needed twice the number of chips and twice the training data to achieve comparable results.

This efficiency breakthrough appears to have rattled investors, who are questioning whether the future of AI development might be less dependent on cutting-edge — and extremely expensive — hardware than previously thought. (Though I'll explain in my next article why I believe this worry is probably misplaced.)

The Open-Source Paradigm

Unlike proprietary models such as OpenAI's o1, DeepSeek R1 was released under an MIT license, making it freely accessible for both research and commercial use. While it's not completely open-source — it's technically an "open weights" model, as some training details remain private — the company has been remarkably transparent.

What does this mean in practice? Anyone can download and run DeepSeek R1 or its variants on their own systems. You can modify the model if you have the expertise. The likes of Hugging Face and Amazon already offer their own implementations. I'm even running a smaller variant on my laptop as I write this article at 35,000 feet without internet connectivity — how cool is that?!

This open approach clearly raises serious questions about the closed-source strategy favoured by OpenAI and Anthropic, and even challenges Meta's interpretation of open-source AI, which has yet to produce truly competitive models.

The Geopolitical Implications

That such breakthroughs came from a relatively unknown Chinese AI lab has sparked intense debate about the future of AI development. Some industry observers suggest this might signal the end of US dominance in AI — though I'd argue the conclusion is premature, given the significant advantages the US still maintains in this space.

Data Security & Privacy Issues

The security implications of DeepSeek R1 have sparked serious controversy, with several countries banning its use on government devices.

The concerns stem from DeepSeek's unusually invasive tracking practices and data sharing with Chinese state-owned entities, culminating in a significant data breach in that exposed millions of chat histories and API secrets. (We’ll dive into this topic in more depth later in this article.)

3) Who developed DeepSeek R1?

DeepSeek is a Chinese AI lab that started off as the research branch of the High-Flyer hedge fund but is now its own company. Image Credits: High-Flyer Quant.

Part of the reason that DeepSeek R1 caused such turbulence is that it did not come from one of the usual suspects — OpenAI, Anthropic, Google, or perhaps Meta.

Instead, it emerged from DeepSeek, a Chinese AI lab that first caught my attention in late 2024 with their DeepSeek V3 model. That release was impressive, matching OpenAI's GPT-4o in performance, but it was just a preview of what was to come.

The story behind DeepSeek is fascinating. It began as a research division (originally called Fire-Flyer) within High-Flyer Quant, a successful hedge fund. At the helm is Liang Wenfeng, who wears multiple hats as both the founder of High-Flyer and principal researcher at DeepSeek.

While some news outlets have dismissively called DeepSeek a "side hobby" of High-Flyer, that couldn't be further from the truth — the lab employs over 150 AI researchers and has now evolved into a full-fledged company in its own right.

Here's what I find particularly intriguing: despite being privately funded (by Liang and High-Flyer) with no special government backing beyond standard AI sector subsidies, they've taken a remarkably different approach to AI development compared to their Western counterparts. Their mission statement — "to unravel the mystery of AGI with curiosity" — might sound similar to other AI labs pursuing Artificial General Intelligence, but their execution sets them apart.

Unlike the race-to-market mentality we've seen from other major AI labs, DeepSeek has prioritised research over immediate commercialisation. As Liang puts it, "When this society starts celebrating the success of deep-tech innovators, collective perceptions will change. We just need more real-world examples and time to allow that process to unfold." This philosophy has led them to open-source their research and offer their models at remarkably affordable API prices.

This commitment to openness isn't just talk — since their inception in July 2023, DeepSeek has published four comprehensive technical papers, including detailed specifications for both V3 and R1. It's a refreshing contrast to the increasingly opaque approach of certain companies (I'm looking at you, ironically-named OpenAI!).

4) How did DeepSeek achieve such remarkable efficiency?

While DeepSeek R1 is largely on par with OpenAI’s o1 reasoning model from a performance standpoint, where it has really garnered attention is on how the model has been trained and can be run at a fraction of the cost (more on this below).

It's a classic case of necessity being the mother of invention — faced with limited access to cutting-edge chips due to US export controls, DeepSeek's team had to approach model development differently.

Their approach combines existing best practices with genuinely novel techniques across three key areas:

Architectural Improvements: Clever tweaks to how the model is structured
Hardware Optimisation: Smarter ways to use limited computing resources
Training Process Innovations: More efficient methods for teaching the AI

Let's break down some of their most fascinating innovations:

Architectural Improvements

a) Mixture of Experts (MoE)

Imagine walking into a specialist medical centre. Instead of one doctor attempting to handle every condition, you have a cardiologist for heart issues, a neurologist for brain matters, and so on.

That's essentially how MoE works. DeepSeek R1 has 671 billion parameters (think of these as being analogous to the connections between neurons in a brain) but it only activates 37 billion at a time — just the "specialists" needed for each specific task. This makes it more efficient and therefore cheaper to run the model.

b) Multi-Head Latent Attention (MLA)

Most LLMs process information like a single person reading a complex document — they constantly refer back to different sections, which takes time and mental energy.

MLA is more like having a team of readers, each focusing on different aspects simultaneously. One might track character development, another follows plot points, and a third notes thematic elements. When they combine their insights, they reach understanding more efficiently than a single reader ever could.

Hardware Optimisations

c) Low-Precision Computation

DeepSeek found clever ways to balance accuracy with efficiency. With limited access to powerful hardware, they were judicious about using high-precision calculations.

Think of it like this: lower precision data is like rounding Pi to 3.142, while higher precision data would keep it at 3.14159265358979. They used the more detailed calculations (BF16 or FP32) only where absolute accuracy is crucial, and more efficient, lower precision methods (FP8) everywhere else — a bit like how you might use precise measurements when baking a cake, but eyeball the seasoning for a soup.

d) DualPipe Algorithm

This is DeepSeek's solution to a common problem in AI processing, known as "pipeline bubbles”, where some chips sit idle while others work. Since high-performance chips are scarce, this is highly wasteful.

The DualPipe Algorithm enables simultaneous processing of computations and data communications. By carefully scheduling tasks, it ensures that while one part of the GPU is handling computations, another is managing data transfer. This is akin to having a perfectly managed assembly line where components are always either processing or preparing for their next task. No more coffee breaks for the hardware!

Training Process Innovations

e) Group Relative Policy Optimisation (GRPO)

Reinforcement Learning (RL) is an increasingly common technique that gets the AI to learn from trial and error, just as how humans or animals learn through experience. Akin to a player in a game, the AI is allowed to take any action it wants, and gets feedback in the form of a reward or penalty. Over time, it learns what actions or combination of actions work best to maximise its rewards.

DeepSeek developed GRPO as a smarter way for AI to learn from rewards. Instead of training the AI with one answer at a time, it generates multiple answers at once, ranks them, and learns which answers are relatively better within the group. Therefore, rather than asking, "How good is this answer overall?", it asks, "Which answer in this group is better?" and learns from the comparison.

This makes the training process simpler, faster, and more efficient.

f) Multi-Stage Hybrid Pipeline

DeepSeek used what is called a multi-stage training pipeline, that divides the training process into phases. This approach is not uncommon but where DeepSeek stood out was in how they sequenced the process to mirror how humans master complex skills.

This approach begins with a "cold start" phase, where the model is fed a small yet carefully curated dataset. This teaches the model to format answers logically before mastering content, and is akin to teaching someone the rules of grammar before asking them to write a novel.

The second stage involves Reinforcement Learning using their unique GRPO technique (as described above) to enhance problem-solving abilities .

Next, R1 went through a quality control process whereby R1 generated large numbers of potential solutions to problems, and DeepSeek V3 was used to filter the top 10% of responses — rather like having an experienced teacher mentor a promising student.

This automated validation technique was another innovation from the DeepSeek team, because most training pipelines currently involve a high degree of expensive human validation to check the model’s responses.

All Together

DeepSeek’s magic is in bringing together a variety of innovations, some homegrown, most sourced from best practices elsewhere to create a model that delivers state-of-the-art performance at a fraction of the usual cost.

What I find particularly exciting is how these advancements will help to democratise AI development — making it possible for smaller teams with limited resources to compete effectively with Big Tech and the major AI labs.

5) Did DeepSeek R1 really take only $5.6 million to train?

The short answer is "No" — and the long answer reveals a fascinating story about how AI developments are often misunderstood or misrepresented in the media.

That widely-quoted $5.6 million figure comes from a technical paper about DeepSeek V3, the company's previous model. In it, they detailed that their final training run — including pre-training, context length extension, and post-training — required 2.788 million GPU hours. Using a rental price of $2 per hour for an NVIDIA H800 GPU, they calculated a total cost of $5.576 million.

While this figure appears to cover the final training runs for both DeepSeek V3 and R1, it's akin to saying a house costs only the price of its final construction materials, while ignoring the architects, planning, land purchase, and so on.

Let’s consider the actual cost components of training an AI model like DeepSeek R1:

Preliminary experiments and failed training runs
Staff time and expertise
Data acquisition and processing
Infrastructure setup and maintenance
Hardware investment

Speaking of hardware, the scale required for modern AI development is staggering. DeepSeek used a cluster of 2,048 H800 GPUs, each costing around $30,000 — that's a $60 million investment just in chips. In a recent essay, Anthropic's Dario Amodei suggests DeepSeek may possess something like 50,000 H100, H800, and H20 GPUs, potentially representing a billion-dollar hardware investment.

Of course, this hardware isn't exclusively used for R1 — it's spread across multiple projects, R&D efforts, and experimental runs. But it illustrates an important point: despite increasingly efficient training methods, building a top-tier AI lab still requires massive capital investment.

Even factoring in all these costs, however, the full expense of developing R1 likely remained under $10 million — still low compared to its comparable counterparts:

Meta's Llama 3, a less capable open-source model, required ~30.8 million GPU hours to train, which is about 11 times more computing power than DeepSeek V3.
Anthropic's Claude 3.5 Sonnet, released about a year before DeepSeek V3, cost "tens of millions" to train, according to Dario Amodei, their CEO.
OpenAI’s GPT-4 model incurred compute costs of $78 million, and CEO Sam Altman confirmed that total expenses exceeded $100 million.

And as noted earlier, while training costs tends to grab the headlines, inference costs — the ongoing expense of actually running these models — tell an even more compelling story. Consider the API pricing:

OpenAI o1: $15 per million input tokens and $60 per million output tokens
DeepSeek R1: $0.55 per million input tokens and $2.19 per million output tokens

That's a 30x price difference! Even accounting for profit margins, this suggests DeepSeek has achieved dramatically lower operational costs. How's that for efficiency?

Before we get too excited about these cost differentials though, here's something to consider: AI development costs have historically decreased by about 4x each year. This means that we should anyway expect models to become significantly cheaper over time — it's part of the natural evolution of the technology.

Therefore, the question we should be asking ourselves (and which I shall explore in the next issue), is whether DeepSeek is actually ahead or on trend with the cost curve.

6) Is DeepSeek R1 censored by the Chinese government?

The relationship between DeepSeek R1 and content restrictions presents an important technical distinction that's again (and often) missed in media coverage: there's a difference between the underlying model and how it's deployed.

The core R1 model itself isn't directly censored. However, DeepSeek's official implementation — the version you'll find on their website — includes content restrictions to comply with Chinese regulations. This manifests in several ways:

When using DeepSeek's official platform, you'll encounter restrictions around politically sensitive topics. The model deflects discussions about events like the Tiananmen Square protests, refers to Taiwan in alignment with mainland Chinese policy, and follows specific narratives on topics like human rights issues. In discussions about international events, responses tend to mirror official Chinese foreign policy positions, emphasising themes of dialogue and regional cooperation.

What I find interesting is how these restrictions are implemented. Rather than being built into R1's core programming — its "DNA" if you will — the post-processing filters that are applied are more like having a supervisor who reviews and potentially modifies answers before they reach the user.

You can observe the post processing filters in action by using misspelled sensitive terms. These will initially pass through but once the system identifies them as “offending terms”, the restrictions will come back into play.

Thanks to R1's open-source nature, these restrictions aren't set in stone. Third-party implementations hosted outside China, like those offered by Hugging Face and Perplexity, operate without these filters. Developers can create their own implementations with modified or disabled content filtering.

However, researchers have noted that even unrestricted versions of R1 show certain biases in their responses. This likely stems from the model's training data, which includes a significant proportion of state media sources. These biases become particularly apparent in responses about historical events or geopolitical situations, where the model tends to disproportionately cite Chinese government sources.

There's also an intriguing linguistic dimension. The model's responses can vary depending on whether you're asking questions in Chinese or English, with Chinese-language queries generally receiving stricter oversight. It's a reminder that AI models, like human communications, can adapt their tone and content based on audience.

7) Is your data and privacy secure when using DeepSeek?

The answer depends on your definition of "safe" and individual privacy requirements.

If you use DeepSeek's web-based chatbot, there are significant privacy considerations to consider. If using the API, which is primarily for developers, the situation requires careful assessment. And if you have downloaded a version of the model for use locally on your laptop or server, you maintain the most control over your data security.

We’ll start by examining the current landscape.

DeepSeek's Data Practices and Regulatory Response

Several countries have taken regulatory action — Italy has blocked DeepSeek pending investigation of its data collection practices, while Australia, the Netherlands, Taiwan, South Korea, and several US agencies have restricted its use on government devices. Belgium, France, Ireland, and Croatia have launched investigations into data transfers and compliance gaps.

The key areas under scrutiny include:

GDPR Compliance: DeepSeek's services appear to violate GDPR requirements on multiple fronts. Their privacy policy fails to specify what personal data is collected, how it's used, or retention periods. GDPR rights, data processing purposes, or the legal basis for using personal data to train AI models are not mentioned. Despite storing data in China, they haven't provided required details of any safeguards for international data transfer beyond stating they comply with "applicable data protection laws". The company also lacks a clear process for users to access, correct, or delete their data.

User Tracking: DeepSeek's tracking practices are significantly more invasive than industry norms. Their web tools create a digital "fingerprint" for each user, tracking activity even after leaving the platform. This includes device details (screen resolution, operating system, keystroke patterns), behavioral data (chatbot interactions, uploads, deleted messages, third-party platform activity), and network identifiers (IP addresses, mobile advertising IDs, cookies).
Data Sharing: Cybersecurity researchers have found that DeepSeek's login systems transmit user credentials, device fingerprints and behavioral data to CMPassport.com (China Mobile's registry). China Mobile, banned in the US since 2019 over espionage risks, is legally required to comply with Chinese government surveillance requests. DeepSeek also shares behavioral data with China's state-overseen analytics tool, and device identifiers with ByteDance's Volcengine platform. DeepSeek's iOS app also transmits device information without encryption, making it vulnerable to interception and modification.
Data Exposure Incidents: In January 2025, a cybersecurity firm discovered an exposed database containing over a million lines of sensitive data, including chat histories, backend data, and API secrets. In the same month, DeepSeek suffered a large-scale cyberattack that forced them to restrict new user registrations.

Implications for Users

Web-Based Chatbot: The platform collects extensive user data and shares information with various third parties. Users should carefully consider their privacy requirements and the nature of their intended use. Having evaluated the situation, I have decided to stop using DeepSeek’s web-based chatbot for now.
API Usage: Developers and companies using the API face more serious implications. They risk GDPR fines when processing EU customer data without proper safeguards, could face contractual penalties from the exposed API secrets, and must navigate compliance challenges with data being stored in China.
Local Model Usage: Running the model locally offers the highest level of data control, though users should note that R1 lacks some standard safety guardrails and, like other open-source models, remains vulnerable to prompt injections. Researchers have also called out the (less likely) risk of hidden telemetry in the code, which if present, might secretly send data back to DeepSeek's servers.

Thank you for reading New World Navigator! Please feel free to share this post if you think others would benefit from it!

8) Did DeepSeek steal OpenAI’s IP to create its R1 model?

The controversy swirling around DeepSeek R1's development perfectly illustrates the complexities of intellectual property (IP) in modern AI development.

OpenAI's allegations center around a technique called model distillation. To understand this, imagine a master chef training an apprentice. The apprentice doesn't just learn final recipes but absorbs the master's decision-making process — when to adjust heat, how to balance flavors, why certain ingredients work together. Because the apprentice benefits from the master's organised teaching rather than years of trial and error, they can achieve similar expertise more efficiently.

In technical terms, distillation involves training a new, typically smaller model to mimic a larger model's outputs, including both its answers and its confidence levels in different solutions. When done well, the smaller model can maintain 75-90% of the original's capabilities while requiring far fewer computational resources — as we've seen with DeepSeek's own smaller versions of R1.

OpenAI claims their API logs showed suspicious patterns of data extraction from developer accounts linked to DeepSeek, suggesting the company used GPT-4 as the 'master chef' to train their model.

There's a certain irony here, as OpenAI themselves are defending against copyright lawsuits from The New York Times and other publishers, who claim OpenAI trained ChatGPT on their content without permission. OpenAI's defense that their use falls under "fair use" sits rather awkwardly alongside their complaints about DeepSeek.

The evidence for OpenAI's claims remains circumstantial. While their API logs might raise eyebrows, there's no definitive proof that R1's capabilities came from distilling GPT-4's knowledge. DeepSeek's innovative engineering approaches — like their Multi-Stage Hybrid Pipeline and clever use of Low-Precision Computation — suggests a team quite capable of developing these capabilities independently, albeit perhaps over a longer timeframe if R1 had been developed without model distillation.

The issue here transcends simple IP theft allegations. It speaks to the broader challenges of innovation in AI, where nearly all state-of-the-art models build upon publicly available data, research, and techniques.

9) So what does this all mean?

The story of DeepSeek R1 reads like a classic take on David v Goliath — a relatively small team, working with limited resources, managing to shake the foundations of Big Tech through ingenious approaches that enable them to defeat the shackles of US semiconductor export restrictions.

In my opinion, DeepSeek R1 represents both more and less of a breakthrough than many have suggested. On the one hand, it represents a watershed moment that looks set to usher in a new age in open-source development that will further democratise and accelerate AI development. On the other, most of its innovations, while clever, are largely incremental improvements rather than paradigm shifts in model training.

Justin Tan is passionate about supporting organisations to navigate disruptive change and towards sustainable and robust growth. He founded Evolutio Consulting in 2021 to help senior leaders to upskill and accelerate adoption of AI within their organisation through AI literacy and proficiency training, and also works with his clients to design and build bespoke AI solutions that drive growth and productivity for their businesses. Alongside his consultancy work, he is an Adjunct Assistant Professor at University College London, where he lectures on digital health and AI. If you're pondering how to harness these technologies in your business, or simply fancy a chat about the latest developments in AI, why not reach out?

New World Navigator