The enthusiasm surrounding DeepSeek stemmed from its cost-effectiveness. Even when it comes to inference, it offers a significant reduction in expenses while still delivering comparable quality output.
Discussion
Perhaps the full model is cheaper than o1 etc. but the distills are pointless. They have the exact same cost to run, per token, as the models they are based on but are worse than.