Avatar
someone
9fec72d579baaa772af9e71e638b529215721ace6e0f8320725ecbf9f77f85b1

In the past humans set some targets to reach at while training AI. But now other AI setting targets. One AI creates problems. One AI tries to solve and learn from this experience.

posted on coracle, seemed good on coracle but I may have pasted the picture file two times

what did you say?

it is a weird model. sometimes it says it doesn't know or nobody knows it. for example i asked "what does Nostr stand for?". It said "nobody knows". somehow it estimates that it does not know the answer?

that kind of answers did not happen with llama 3, it always hallucinated before accepting "it does not know".

I was talking about EFT / tapping but I am not an expert. In my quest for healthy living I saw many 'quackery' but I like quackeries more than chemical drugs :) I think most alternative medicine should be considered the real medicine. I have a domain about alt medicine in my leaderboard. If an LLM does not believe in it, it is going to get less scores!

Yes mind I think is choosing between good or bad collapses of the wave. Mind can make you or break you. We all live in our minds (Each of us have another version of reality in our head about what is going on outside).

I like Adler's view but if we can only dream about future based on our past experiences, i.e. we seen the examples and our dream potential is just like past objects or feelings then maybe we cannot go beyond the past even when we are dreaming the future. Can you imagine something that resembles nothing from the past? What would a prayer like "Can you give me things that I can't imagine?" look like if it was accepted?

if a credible person wants to audit the servers I can allow access

Replying to Avatar Guy Swann

I want to know what you saw that made you say this? Did I stop too long on a picture of a naked lady? 😂

nostr:nprofile1qqsyvrp9u6p0mfur9dfdru3d853tx9mdjuhkphxuxgfwmryja7zsvhqelpt5w like, “Jesus Christ, Guy is just all day scrolling through #NSFW and #PenisButter hashtags, I gotta say something so he knows I can see this shit.” 🤣

WYKYK

I only check #NSFW for finding stuff to delete from my relays. Nothing else!

I have no time to track people's usage but yeah if a relay wants to know more they could. Didn't even write the scripts to delete events. So all the events are filling up memory and disks. Sometimes people ask manual deletion of events and I don't reply, I have no tools to do that. I have to login and delete the notes using command line.

My relays are pretty low maintenance. I rarely login and do stuff. Thanks to really solid strfry implementation it runs without problems.

I limited the logs to about 4 GB. So old logs are automatically deleted. This 4 GB limit only allows recent logs to be kept. I keep those logs for any attack from an IP that may happen towards the relay. Maybe I could lower that number more since Hetzner DC seems to be handling traffic floods really well. This setting also means the drives does not fill quickly with logs. So even less maintenance for me.

nos.lol does not keep IP information in logs for a long time (only recent activity which fits the 4 GB limit above). nostr.mom has a very old write policy script from initial days of Nostr. Back then before strfry I was using Cameri's relay software and needed a necessity to store IPs to be able to do IP-ban the attackers. Nowadays those don't seem to happen.

There you go, some transparency for you.

nostr:nevent1qvzqqqqqqypzq0mhp4ja8fmy48zuk5p6uy37vtk8tx9dqdwcxm32sy8nsaa8gkeyqqsv2gngns95tlcwkgz32wzdcn2n452dga00hxw0aahd7mfmfc9ltyqz9qe3c

I think my leaderboard can be used for p(doom)!

Lets say highest scores around 50 corresponds to p(doom) = 0.1

And say lowest scores around 20 corresponds to p(doom) = 0.5

Last three models that I measured are Grok 3, Llama 4 Maverick and Qwen 3. Scores are 42, 45, 41. So based on last 3 measurements average is 42.66. Mapping this to the scale above between 20 and 50:

(50-42.66)/(50-20)=0.24

mapping this to the probability domain:

(0.5-0.1)*0.24 + 0.1=0.196

So probability of doom is ~20%

If models are released that score high in my leaderboard, p(doom) will reduce. If models are released that score low in my leaderboard, p(doom) will increase.

Qwen 3 numbers are in! They did a good job this time, compared to 2.5 and QwQ numbers are a lot better.

I used 2 GGUFs for this, one from LMStudio and one from Unsloth. Number of parameters: 235B A22B. The first one is Q4. Second one is Q8.

The LLMs that did the comparison are the same, Llama 3.1 70B and Gemma 3 27B.

So I took 2*2 = 4 measurements for each column and took average of measurements.

My leaderboard is pretty unrelated to others it seems. Valuable in that sense, it is another non-mainstream angle for model evaluation.

More info: https://huggingface.co/blog/etemiz/aha-leaderboard

median p(doom) is around 30%. i am saying there could be beneficial AGI, which is something that goes against harmful AGI. so my p(doom) would be lower than this.

https://pauseai.info/pdoom

what is the probability of AI government with 50% or more work done by AI, in USA or China in 15 years?

gemma 3 fine tuning was not as effective as llama 3. it responded well to my healthy living type of datasets and learned well. but in faith, fasting and misinformation type of domains, it got stuck and doesn't want to learn more. i guess LLMs can be stubborn too!