Chinese researchers backed by a Hangzhou-based hedge fund recently released a new version of a large language model (LLM) called DeepSeek-R1 that rivals the capabilities of the most advanced U.S.-built products but reportedly does so with fewer computing resources and at much lower cost.
High Flyer, the hedge fund that backs DeepSeek, said that the model nearly matches the performance of LLMs built by U.S. firms like OpenAI, Google and Meta, but does so using only about 2,000 older generation computer chips manufactured by U.S.-based industry leader Nvidia while costing only about $6 million worth of computing power to train.
By comparison, Meta’s AI system, Llama, uses about 16,000 chips, and reportedly costs Meta vastly more money to train.
Open-source model
The apparent advance in Chinese AI capabilities comes after years of efforts by the U.S. government to restrict China’s access to advanced semiconductors and the equipment used to manufacture them. Over the past two years, under President Joe Biden, the U.S. put multiple export control measures in place with the specific aim of throttling China’s progress on AI development.
DeepSeek appears to have innovated its way to some of its success, developing new and more efficient algorithms that allow the chips in the system to communicate with each other more effectively, thereby improving performance.
At least some of what DeepSeek R1’s developers did to improve its performance is visible to observers outside the company, because the model is open source, meaning that the algorithms it uses to answer queries are public.
Market reaction
The news about DeepSeek’s capabilities sparked a broad sell-off of technology stocks on U.S. markets on Monday, as investors began to question whether U.S. companies’ well-publicized plans to invest hundreds of billions of dollars in AI data centers and other infrastructure would preserve their dominance in the field. When the markets closed on Monday, the tech-heavy Nasdaq index was down by 3.1%, and Nvidia’s share price had plummeted by nearly 17%.
However, not all AI experts believe the markets’ reaction to the release of DeepSeek R1 is justified, or that the claims about the model’s development should be taken at face value.
Mel Morris, CEO of U.K.-based Corpora.ai, an AI research engine, told VOA that while DeepSeek is an impressive piece of technology, he believes the market reaction has been excessive and that more information is needed to accurately judge the impact DeepSeek will have on the AI market.
“There’s always an overreaction to things, and there is today, so let’s just step back and analyze what we’re seeing here,” Morris said. “Firstly, we have no real understanding of exactly what the cost was or the time scale involved in building this product. We just don’t know. … They claim that it’s significantly cheaper and more efficient, but we have no proof of that.”
Morris said that while DeepSeek’s performance may be comparable to that of OpenAI products, “I’ve not seen anything yet that convinces me that they’ve actually cracked the quantum step in the cost of operating these sorts of models.”
Doubts about origins
Lennart Heim, a data scientist with the RAND Corporation, told VOA that while it is plain that DeepSeek R1 benefits from innovative algorithms that boost its performance, he agreed that the general public actually knows relatively little about how the underlying technology was developed.
Heim said that it is unclear whether the $6 million training cost cited by High Flyer actually covers the whole of the company’s expenditures — including personnel, training data costs and other factors — or is just an estimate of what a final training “run” would have cost in terms of raw computing power. If the latter, Heim said, the figure is comparable to the costs incurred by better U.S. models.
He also questioned the assertion that DeepSeek was developed with only 2,000 chips. In a blog post written over the weekend, he noted that the company is believed to have existing operations with tens of thousands of Nvidia chips that could have been used to do the work necessary to develop a model that is capable of running on just 2,000.
“This extensive compute access was likely crucial for developing their efficiency techniques through trial and error and for serving their models to customers,” he wrote.
He also pointed out that the company’s decision to release version R1 of its LLM last week — on the heels of the inauguration of a new U.S. president — appeared political in nature. He said that it was “clearly intended to rattle the public’s confidence in the United States’ AI leadership during a pivotal moment in U.S. policy.”
Dean W. Ball, a research fellow at George Mason University’s Mercatus Center, was also cautious about declaring that DeepSeek R1 has somehow upended the AI landscape.
“I think Silicon Valley and Wall Street are overreacting to some extent,” he told VOA. “But at the end of the day, R1 means that the competition between the U.S. and China is likely to remain fierce, and that we need to take it seriously.”
Export control debate
The apparent success of DeepSeek has been used as evidence by some experts to suggest that the export controls put in place under the Biden administration may not have had the intended effects.
“At a minimum, this suggests that U.S. approaches to AI and export controls may not be as effective as proponents claim,” Paul Triolo, a partner with DGA-Albright Stonebridge Group, told VOA.
“The availability of very good but not cutting-edge GPUs — for example, that a company like DeepSeek can optimize for specific training and inference workloads — suggests that the focus of export controls on the most advanced hardware and models may be misplaced,” Triolo said. “That said, it remains unclear how DeepSeek will be able to keep pace with global leaders such as OpenAI, Google, Anthropic, Mistral, Meta and others that will continue to have access to the best hardware systems.”
Other experts, however, argued that export controls have simply not been in place long enough to show results.
Sam Bresnick, a research fellow at Georgetown’s University’s Center for Security and Emerging Technology told VOA that it would be “very premature” to call the measures a failure.
“The CEO of DeepSeek has gone on record saying the biggest constraint they face is access to high-level compute resources,” Bresnick said. “If [DeepSeek] had as much compute at their fingertips as Google, Microsoft, OpenAI, etc, there would be a significant boost in their performance. So … I don’t think that DeepSeek is the smoking gun that some people are claiming it is [to show that export controls] do not work.”
Bresnick noted that the toughest export controls were imposed in only 2023, meaning that their effects may just be starting to be felt. He said that the real test of their effectiveness will be whether U.S. firms are able to continue to outpace China in coming years.
…