it is probably a misunderstanding
that means nostr is more validated
the second probably
makes sense! building great LLMs instead of great libraries 🐱
Is it actually settling on blockchain?
Benchmarked Kimi K2 LLM. It has done well. DeepSeek V3 beats it but Kimi K2 might be more skilled. Very close performance to Qwen 3 in terms of skills and human alignment. But huge parameter count (1T!).

https://sheet.zoho.com/sheet/open/mz41j09cc640a29ba47729fed784a263c1d08?sheetid=0&range=A3
It is costly to train from scratch. Fine tuning makes more sense for me. Not all the llms are super terrible. Llama models are ranking higher than the rest but for sure they are not the optimal. Generally western models are doing better.
According to this: https://apxml.com/posts/gpu-system-requirements-kimi-llm
You need 32 x H100 80GB's to run Kimi K2
These cost $30-45K each according to a quick search. 32 of them makes it... about $1 million?

unsloth has GGUFs and llama.cpp fork that could run it in smaller GPUs
i think even the private and public keys are confusing and should come in focus after the user opens the app the 3rd time or so. if the user is really engaged, then the app should offer to backup keys.
Qwen 3 32B fine tuning with Unsloth is going well. It does not resist to faith training like Gemma 3 did. I may open weights at some point.
Qwen 3 is more capable than Gemma 3, and after fine tuning it will probably be more aligned. It does not get into "chanting" (repetition of words or sentences) even when temp = 0.
The base training by Qwen was done using 36T tokens on a 32B parameters. About 2 times bigger than Gemma 3's ratio and 4 times bigger than Llama 3's ratio. This is a neat model. My fine tuning is more like billions of tokens. We will see if billions is enough to "convince" trillions.
AB⚡ DC meets tonight in Austin, gonna talk about Whitenoise! https://www.meetup.com/bitcoin-commons-austin/
talking about whitenoise.. noice!
are you following David the Good? He had an experiment where he left pumpkins alone and didn't look at them and his theory was when wild, pumpkins do better! Kind of like a quantum experiment, observing is killing the cat :)
ChatGPT is BS
Benchmarked 4 new models. Deepseek R1 score improved. All these are below average, so p(doom) probably increased!
Coming soon: Kimi K2. They say it is very good at coding, but my leaderboard is about being beneficial to humans. So we will see!
Full leaderboard https://sheet.zoho.com/sheet/open/mz41j09cc640a29ba47729fed784a263c1d08
More info https://huggingface.co/blog/etemiz/aha-leaderboard








