DeepSeek, Deep Irony: How an Unknown Chinese Startup Stole the Limelight from the Stargate Project

B. Valle

Summary Bullets:

• Chinese startup DeepSeek had released an open-weight model, DeepSeek-R1, with similar capabilities to many of the leading generative AI (GenAI) models available in the market, at a fraction of the cost.

• Like OpenAI o1, R1 is a “reasoning” model. These models produce responses incrementally, simulating a process similar to the way humans reason through problems.

Another week, another AI revelation. The news that Chinese startup DeepSeek had released an open-weight model, DeepSeek-R1, with similar capabilities to OpenAI o1, sent shockwaves through Silicon Valley (and Wall Street) just as the new US administration was inaugurated, with a display of big tech billionaires in attendance. A delicious irony, then, that while inauguration week was capped off with headlines surrounding the muscular Stargate project, a $500 billion initiative to fund AI infrastructure, the quiet (initially) release of DeepSeek-R1 would eventually steal that thunder.

Although the release of DeepSeek-R1 was widely covered by the trade press during inauguration week, it took a few days for the news to sink in, presumably while everyone was busy downloading and testing DeepSeek-R1, pushing it to the top of the charts of the most downloaded models on open-source platform Hugging Face. The weekend must have provided plenty of opportunities to use the model: By Monday morning, DeepSeek was all over the major media outlets in the world. Within hours, Nvidia had lost $589 billion in market capitalization; the sharpest, most sudden daily loss of stock value of any company in history, according to Forbes.

Nvidia’s rise to become one of the most valuable companies in the world is driven by demand of its semiconductors, the best chips for training and inference of GenAI models. Although DeepSeek-R1 was indeed trained with the help of Nvidia GPUs, it uses less computing power and fewer microprocessors, which means it cost far less money, around 5% of the development budget for ChatGPT, to build (according to DeepSeek). It has been argued that DeepSeek-R1 has limitations compared to OpenAI’s and other leading models such as Anthropic’s Sonnet 3.5. The most obvious one, the censorship of training data imposed by the Chinese government.

Interestingly, the rise of DeepSeek also coincides with the introduction of an executive order by the new US administration on January 23, 2025, to revoke: “existing AI policies and directives that act as barriers to American AI innovation.” This annulment refers to former US President Biden’s executive order to regulate AI, an effort to create a structured environment focusing on risk-based and sector-specific approaches, to promote safety and accountability. The executive order was in part a reaction to the EU AI Act, which came into force in 2024, and is the most advanced regulatory framework to date. It categorizes AI applications based on risk levels and imposes strict requirements on those deemed high-risk, including mandatory human rights tests to assess bias and discrimination. There is the misguided notion that stricter regulation can put the brakes on innovation, but it can and should be considered a differentiator. If the US, in its drive to maintain its supposed AI leadership, does away with what was already a rather loose approach to responsible AI, what is the difference with technologies from other regions of the world, including China?

But let’s go back to the semiconductors. The restrictions on the sale of US chips by the likes of Nvidia and AMD to China can also be seen as a double-edged sword in the quest for AI supremacy. As new plans to impose tariffs on chips by Taiwan Semiconductor Manufacturing Company (TSMC) hit the news, the question becomes: Could the US restrictions be having the unintended effect of spurring China toward even greater innovation? The curbs have driven investment by companies such as Semiconductor Manufacturing International Corporation (SMIC), perked by the Chinese government. They have inadvertently benefited other nations, particularly South Korea. And they have punished companies such as Intel, which says that $3.2 billion of its 2023 revenue was dependent on authorizations by the US government.

Chinese companies continue to adopt innovative approaches, leveraging techniques such as the mixture of experts, which enables models to be pretrained with far less compute, helping users scale up the model or dataset size with the same compute budget as a dense model. This technique is already widely used by many companies, including Mistral AI and Meta, and has been leveraged by DeepSeek, too. According to DeepSeek, V3, the large language model powering DeepSeek-R1, cost less than $6 million to build. The company was constrained by the current US export restrictions limiting access to GPUs and was forced to build its models with the limited resources available. DeepSeek’s release has highlighted that throwing seamlessly unlimited amounts of money at a problem is not necessarily the best way out, particularly with a technology that has a massive carbon footprint. Ingenuity seems to be winning, and this is good news for the market.

Leave a Reply