Seeing lots of talk about DeepSeek so I want to add my 2 sats.
It’s an impressive model and likely to be part of a longer series of even more impressive releases from that group. I’m happy to see it open sourced.
That being said, training an LLM via reinforcement learning can lead to dangerous results if the reward mechanism isn’t crafted carefully. Part of the reason the big players are relatively slow to make big leaps is because they’re focused on alignment [1].
As an analogy: yes you can get to your destination faster if you drive 4x the speed limit, but we put speed limits in place to limit the number and severity of accidents. (I’m a hypocrite here because I’m a speed demon)