That's not really Deepseek R1, it's a distilled version of Alibaba's Qwen-32B architecture, enhanced using synthetic outputs from the larger DeepSeek R1 model.

Quite useful but not hte same thing.

Reply to this note

Please Login to reply.

Discussion

Yes, it's all described on the model choice

It *is* r1, which is the name for the distilled version as you describe. The bigger model is called v3.