moonshot kimi k2 is a fantastic model to assist with research - use it via nostr:npub16g4umvwj2pduqc8kt2rv6heq2vhvtulyrsr2a20d4suldwnkl4hquekv4h - very cheap

Reply to this note

Please Login to reply.

Discussion

qwen3 even better?

seems about the same on my usage. haven't been coding with it, but if token cost is lower might be worth it

qwen3 is our number 2 model over the last 7 days.

stats.ppq.ai

I've noticed that if you ask the Claude models what version they are through ppq, what they say doesn't match the model. Like 3.7 sonnet will say it's opus 3 and opus 4.1 will say it's 3.5 sonnet.

Yea that's common. The models are built on previous versions of themselves. This is something we can fix by injecting into the system prompt "you are X model". We will probably do this as users seem to be suspicious that we are giving them sub-par models. But a nefarious provider could inject that into the system prompt anyways so 🤷‍♂️ .

Unfortunately it is hard to prove the models we are giving to users. Hopefully something will come along soon that proves such things.

Maybe we can build some sort of open source proxy which shows how the routing of the models is actually working?

Ah, I see.

I thought it might be a bug, but what you said makes sense. Personally, the models do work as advertised. That's enough for me.