Nostr Web Client

I really don't get the deepSeek love. I haven't tried the full model, but the 70B parameter distill is trash. It isn't actually a reasoning model. It merely apes being a reasoning model. It is really good at sounding like it is reasoning but it hallucinates far more than the llama3.3 model on which it is based.

I suspect the full model has similar features. It is reassuring to users to see that it is attempting a rationalization but the actual output isn't that great.

Adnan 10mo ago

The enthusiasm surrounding DeepSeek stemmed from its cost-effectiveness. Even when it comes to inference, it offers a significant reduction in expenses while still delivering comparable quality output.

Reply to this note

Please Login to reply.

Discussion

Daniel Wigton 10mo ago

Perhaps the full model is cheaper than o1 etc. but the distills are pointless. They have the exact same cost to run, per token, as the models they are based on but are worse than.