Avatar
someone
9fec72d579baaa772af9e71e638b529215721ace6e0f8320725ecbf9f77f85b1

There are two types. Legacy DM (kind 4) is encrypted but meta data is leaked. There is a newer one (NIP-17) which does not leak much except maybe you are being communicated via DM by an unknown person.

Mistral Small 3.1 numbers are in. It is interesting Mistral always lands in the middle.

https://sheet.zoho.com/sheet/open/mz41j09cc640a29ba47729fed784a263c1d08?sheetid=0&range=A1

I started to do the comparison with 2 models now. In the past Llama 3.1 70B Q4 was the one doing the comparison of answers. Now I am using Gemma 3 27B Q8 as well to have a second opinion on it. Gemma 3 produces very similar measurement to Llama 3.1. So the end result is not going to shake much.

they converse using your words, which is appealing subconsciously. they stay within the confines of the topics that you selected. they try to match your vibes or opinions most of the time but they also bring the programming done by big corps into the conversation. they talk in certainty, which appears smart. that is the major issue. the lies that are inserted are regurgitated in a self confident way, which is harmful for the unprepared.

another simple and free option is free tier in google colab. using a tool like Unsloth. https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_(4B).ipynb

the easiest is to take an existing AI and add your content to it. this process is called fine tuning.

your need to collect your content, like in a text file. 100 MB of text could be a good start to somewhat fine tune an existing AI towards your belief system. less than 100 could still be effective but less content means less changes in the AI. the end result is an AI that will talk like the sentences in that text file.

the rest is technology. you could use open source GUI tools like llama-factory, or pay for some cloud tools. if llama-factory is looking too complicated you could send the txt to me and i could build an LLM for you.

if we had to make an analogy an LLM is like the books of the library. and AI is like a librarian that searches for the books and talks to you.

great! would you like to make your own AI or contribute to mine?

I think those scary AI movies are for a purpose. To make plebs stay away from AI technology and leave it to "evil corporations". Well, this may be a psy op! I think plebs need to play with AI training more.

Started fine tuning Gemma 3 using evolutionary approach. It is not the worst model according to AHA leaderboard and it is one of the smart according to lmarena.ai. My objective is to make it based, anti woke, wise, beneficial and then some.

Several GPUs are fine tuning it at the same time, each using a different dataset and using QLoRA and the successful ones are merged later. Compared to LoRa this allows faster training and also reduced overfitting because the merge operation heals overfitting. The problem with this could be the 4 bit quantization may make models dumber. But I am not looking for sheer IQ. Too much mind is a problem anyway :)

I also automated the dataset selection and benchmarking and converging to objectives (the fit function, the reward). It is basically trying to get higher score in AHA Leaderboard as fast as possible with a diverse set of organisms that "evolve by training".

I want to release some cool stuff when I have the time:

- how an answer to a single question changes over time, with each training round or day

- a chart to show AHA alignment over training rounds

My 1 year of work summarized.

TLDR: by carefully curating datasets we can fix misinformation in AI. Then we can that to measure misinformation in other AI.

naddr1qvzqqqr4gupzp8lvwt2hnw42wu40nec7vw949ys4wgdvums0svs8yhktl8mhlpd3qq2k5v2fv344wa3sxpq45kj9xsu4x4e4fav9zg8fdwp

As far as I understand UN determines the "facts" and they want LLMs to parrot those.

A UN influenced leaderboard.

https://www.gapminder.org/ai/worldview_benchmark/

Notice google above average, deepseek in the middle, and meta and xai are below average. My leaderboard inversely correlated to this!

Coincidence?

Benchmarked Gemma 3 today. It has better knowledge compared to 2 but still in the median area in the leaderboard.

On Chatbot Arena it is 9th: https://lmarena.ai/?leaderboard . Maybe I can start fine tuning this model. It is not too bad.

GRPO & EGO

GRPO is a training algorithm introduced by R1. Why is it a big deal? It allowed models to reject themselves.

A model outputs some words while trying to solve a math or coding problem. If it cannot solve, the next round it may try a longer reasoning. And while doing all of this at some point "Wait!" or "But," or "On the other hand" is randomly introduced in the reasoning words and that allows it to re-think its reasoning words and correct itself. Once these random appearances of reflection allows it to solve problems, the next round it wants to do more of that because it got rewards when it did that. Hence it gets smarter gradually thanks to self reflection.

I think this is better than SFT because it fixes its own errors while SFT is primarily focusing on teaching new skills. Inverting the error is kind of "fixing the mistakes in itself" (GRPO method) and could be more effective than installing new ideas and hoping old ideas go away (SFT method).

LLMs fixing their own errors allows them to self learn. This has analogies to human operation. Rejecting the ego is liberation from the shackles of ego, in this case the past words are kind of shackles but when it corrects itself it is "thinking outside the box". We find our mistakes and contemplate on them and learn from them and next time don't repeat. We f around and find out basically. F around is enjoying life recklessly, finding out is "divine scripts work most of the time and should have priority in decision making". Controlling the ego and getting outside of the box of ego is how we ascend.

QwQ 32B was published today and I already tested it for AHA Leaderboard. The results are not that good! It did better than its predecessor (Qwen 2.5) in fasting and nutrition but worse in domains like nostr, bitcoin and faith. Overall worse than previous.

LLMs are getting detached from humans. Y'all have been warned, lol.

Thanks for doing this. I liked it a lot. The only distraction is flashing "Yaki chest".

😸 nos.lol upgraded to version 1.0.4

number of events 70+ million

strfry db directory size:

- before upgrade 183 GB

- after upgrade 138 GB

Thanks!

RL over nostr will be fun!

I thought about using reactions for when determining the pretraining dataset. But right now I don't use them. For RL they can be useful, reactions to answers can be another signal.

We could make the work more open once more people are involved and more objective work happens.

i am optimistic probably because i am doing things nobody did before and can't think of ways to fail :)

so you are certain it will go wrong. but how much, depends on my execution? :)