Some thoughts on DeepSeek- The Black Swan for MAG7 or something else ?

For various reasons, I was able to spend much more time on this topic since Sunday than I would usually have. On Sunday morning, the topic somehow picked my interested and I have been trying to understand as a Non-Expert what is going on here.

For full disclosure: I have no positions in any of the MAG7 stocks, but that might make me equally biased than someone who has mortgaged his family home to invest in NVDIA.

On Sunday Morning, I initially used mostly Twitter, but during the day this was overflooded with MAGA Crap. Twitter is still a good place at an early stage for “virally developing situations”, bit it gets washed with (AI written) turd pretty quickly.

The DeepSeek topic is interesting on many dimensions. Here are some facts (taken from Wikipedia, but confirmed by other sources):

  • DeepSeek is a subsidiary of an AI/Quant Investment firm called HighFlyer based in China. It was span out in 2023 as a subsidiary, funded by the parents money and released their first really good model (V2) in May 2024, outperforming local Big Tech rivals and simultanously undercutting them massively on price.
  • The model that caused the “Panic of January 27th”, was actually Deepseek R1, the reasoning model that was already released in November 2024 as a lite version, following by V3, a very powerful (normal) LLM in December
  • On January 20th, DeepSeek then released the “full” R1 version which outperformed the competing ChatGPT o1 model in most dimensions (or was at least) equal.

So it took quite some time that people realized that there was a really powerful Chinese model out there. That timeline in my opinion also contradicts the “Hedge Fund releases top LLM model to make money by shorting MAG7 stocks” to a very large degree.

What seemed to have shocked most people in the beginning was the fact that Deepseek mentioned, that the pure “compute cost” of training was only 5 mn USD. This compares to a total of 1 bn USD “training cost” for ChatGPTs o1 model, for which OpenAI just started to charge 200 USD per month for unlimited access. One of the reason for the cheap cost was that they trained on a limited amount of old NVIDIA chips. At least for me, it was not able to compare those numbers even at a high level. What was included for instance in the 1 bn for ChatGPT ? Nobody really knwos.

Very soon, Twitter began to fill up with posts that this is all a Chinese Hoax, it cannot be, they have cheated, It’s a Chinese Psyop, they want to steal your data, they stole from the Great American models, they want to destabilize America etc. MAGA in full force. So if you checked out Twiter on Sunday afternoon, you would most likely believe that this is nothing.

However, The Chinese had not only granted access to the model through a web app, but offered it for free download as “open Source” model including a very detailed paper about what they did.

Some experts quickly pointed out, that the new model included indeed a couple of very smart “tweaks” or even architectural differences, that made the model not only easier to train but also more performant on old hardware.

It was also really interesting to see how the “Big Tech” guys reacted to Deepseek, depending on what their vested interest is:

So where does that leave us ? To be clear, I haven’t become an AI expert over the past 3 days. All I can do is to look at what people whon know much more than I are saying and weighing it with their vested interests.

So for me the most probable interpretation is as follows:

  • DeepSeek is really a very model and surprirsed most of the American players
  • Maybe the true training cost was higher than 5 mn USD, but the tweaks they made sugests that they were quite limited with computational resources
  • The model seems to contain a couple of innovative features that makes it both, easier to train and run on less demanding hardware and therfore cheaper

So is this the “Black Swan” for the MAG7 ? Personally, I don’t think so. Overall AI adoption will clearly speed up if models are cheaper to train and cheaper to run.

Maybe some of the big players might scale back their data center plans somehow, maybe not. However, it makes the story more complex. The story so far was, that only with the newest NVIDIA chips you could develop a really good model. Access to the newest generation of NVIDIA chips was the single most important factor to determine the future of any AI start-up or other AI Model company.

I guess this will definitely change. New players will come out and offer models with great capabilities requiring a lot less CapEx than Xai, OpenAI, Anthropic etc. This will be great news for users, for the exisiting players it will mean that the cost of capital has increased for the time being. How many “professional” users will pay OpenAI 200 USD/month for something that they can download for free and run it for a fraction of the cost themselves ? I will assume that many of the current LLM developers will scramble to make their current cash buffers last longer than planned before the next funding round. And in the VC space, the 2024 AI vintage might look very bad in 12-18 months time already.

Therefore it is also not so surprising, that Apple, which so far did not officially develop LLM actually saw its share price increase. They will have much more partners to chose in the future and might easily be able to run “distilled” models on their phone, which could be a great value proposition for privacy minded customers.

But what about NVIDIA ? Honestly, I do not know. My best guess is that maybe in a few quarters, growth starts to go down a little bit, maybe not. From researching DeepSeek over 3 days, I am not able to understand their full business model and all implications from this.

Summery & take aways

  • First, I promise that I will not become another “AI expert”, but I think more than ever, each and everyone really needs to become familiar with these models, how to work with them and how the availability of cheap AI power could transform whole industries. Don’t let yourself fool that a model might not be able to answer every quations right away. They will become better in a relative short period of time.
  • Second, I do think that the “Cost of capital” for most MAG7 companies have increased, but due to my limited knowledge here, this is not actionable advice. But DeepSeek is clearly no BlackSwan either
  • Third: It will become more and more important to curate the news feed in the future for really trustable and competent sources. Maybe a (personal) AI can help with this, maybe it makes it worse. But especially on Twitter/X, it is almost unbearable what kind of shit-tsunami has been happening over these 3 days with “real” and “trustable” infromation was very hard to find.

    This clearly opens the opportunity for independent, trustworthy people to create a brand of Trust if they manage to keep their independence.

Full disclosure: This post was written without the help of any LLM model, during my research, I did use various AI tools however.

8 comments

  • What’s important to note that DeepSeek is not a newcomer to AI (nor is Alibaba with their Qwen model family). They got wide and massive attention due to their reasoning model and the iOS App that stormed the AppStore and became #1 over the course of 3 days. This was amplified by cloud providers being delayed in providing access to this model as users were forced to flock to a single platform. In other cases, launches of open source models are accompanied with launches across cloud providers. Further, the techniques applied in the latest model are novel and state of the art – they used a machine learning technique called Reinforcement Learning with success to develop new behaviours for the models that strengthened their reasoning capability.
    Developers had been using deepseek-coder:6.7b-base as model for local programming (alternative to buying API access to models) throughout 2024. In Nov 2023, the company published the first LLM model. They have a streak of innovation, comparable to Mistral (French AI startup) with a focus on cost consciousness and who also offers API based access to their models (text, vision, etc). To build out such offerings, the inference process (where users make use of the models) is using most of the GPU capacity, not the training process.
    Their credibility of DeepSeek can be traced through:

    research papers: https://arxiv.org/search/cs?searchtype=author&query=DeepSeek-AI
    open-source repositories: https://github.com/deepseek-ai
    published model weights and experimentation spaces: https://huggingface.co/deepseek-ai

    We can also see that their ability to offer Web UIs for the masses is experiencing the same issues as known from the launch of ChatGPT (security issues, lack of GDPR compliance, etc). A security firm called Wiz.io explained the data leak they found on the platform: https://www.wiz.io/blog/wiz-research-uncovers-exposed-deepseek-database-leak
    What definitely is remarkable is the reaction of the US-based folks, who were so convinced that with their deep pockets they have a monopoly and will always be better than others. Their bubble could not be bigger (reaction reminds me of US folks discovering through Little Red Book how life of rich folks in China looks like and not understanding the concepts of living cost differences). Aside from BigTech, that is world-class, the solutions the US has for things like personal ID documents is years behind the EU that has an e-ID standard (eIDAS 2.0) with states like Estonia who are way ahead the EU baseline: their e-ID is widely used for example in e-commerce stores w/o the need to register customer accounts on the platforms as e-ID is used for identity checks.

  • Very useful, thank you!

    Any view in terms of impact on your portfolio companies, in particular EVS and Jensen? I would think, off the bat, no major impact but slightly positive, if anything?

  • any thoughts on overall picture for datacenter buildout, energy requirements following this? This is where im the least clear

    • There is clearly more uncertainty, so much is certain. Which should mean higher cost of capital for everyone involved for some time

      • to me this suggest a higher premium now for the efficiency enablers. Cooling, architecture optimisation, cost of energy powering these GPUs…existing committed capex thats too late to reverse must be scrambling to improve the ROI following this

  • great article, thanks.

  • Hey! Interesting article. Please watch the CNBC video from Davos with Alexandr Wang. He claims (2.40 in) that Deepseek has bought 50 000 of Nvidias newest chips (h100). This was before the Jan 27 meltdown.

    so perhaps the low dollar figures discussed are only partially true.

    • This video was constantly pushed into my Twitter Feed as well. But many of the model design features indicate that they have been constrained with regard to compute.

      Know one knows for sure, but I think there is a high probability that they achieved this without large amounts of H100 chips. Maybe more than 5 mn USD, but most likely a lot less than OpenAI. And as mentioned in the psot, it doesn’t matter that much.

Leave a reply to memyselfandi007 Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.