I really don't get the deepSeek love. I haven't tried the full model, but the 70B parameter distill is trash. It isn't actually a reasoning model. It merely apes being a reasoning model. It is really good at sounding like it is reasoning but it hallucinates far more than the llama3.3 model on which it is based.

I suspect the full model has similar features. It is reassuring to users to see that it is attempting a rationalization but the actual output isn't that great.

Reply to this note

Please Login to reply.

Discussion

I have the same results with 70B! It’s hot garbage taking up too much space. I find better results with phi4 or llama 3.1

Yup. Give 3.1 or 3.3 instructions to show their work step by step and correct any mistakes and it does light years better. 3.3 is better even without a fancy prompt.

The enthusiasm surrounding DeepSeek stemmed from its cost-effectiveness. Even when it comes to inference, it offers a significant reduction in expenses while still delivering comparable quality output.

Perhaps the full model is cheaper than o1 etc. but the distills are pointless. They have the exact same cost to run, per token, as the models they are based on but are worse than.