wow https://www.arxiv.org/pdf/2505.03335

Reply to this note

Please Login to reply.

Discussion

Super interesting.

But not really zero data if you count the pre trained models.

Makes a good name tho.

fascinating 🧐

Reply me ✌️

Explain it like I'm a reasonably intelligent middle-aged computer scientist who has not kept up with the state of the art of machine learning

In the past humans set some targets to reach at while training AI. But now other AI setting targets. One AI creates problems. One AI tries to solve and learn from this experience.

Crazy that's not something thats explicitly programmed in...

Sorry I should have read it all but : This paper, called "Absolute Zero: Reinforced Self-play Reasoning with Zero Data," is about teaching AI to get really good at solving problems without needing humans to give it tons of examples first. Imagine if you could learn to play a video game super well just by practicing on your own, without watching tutorials or getting tips. That’s kind of what this is about!

Here’s the simple version:

- **What’s the problem?** Normally, AI needs lots of human-made questions and answers to learn how to think and solve problems. But getting all that data is hard, takes time, and might not even be enough for super-smart AI in the future.

- **What’s the cool idea?** The researchers came up with a way called "Absolute Zero" where the AI makes up its own challenges (like creating its own puzzles) and then solves them. By doing this over and over, it gets better at thinking and reasoning without any outside help.

- **Why is this awesome?** It means AI could learn on its own, which is faster and could work for all kinds of problems, even ones humans haven’t thought of yet. It’s like the AI is its own teacher!

The paper gets into some fancy techy stuff like "reinforcement learning" (a way AI learns by trial and error), but the big takeaway is that this could make AI way more independent and powerful in the future. Pretty cool, right?

If we increasingly take humans out if the loop, alignment becomes more and more relevant

We the people are in the loop

At the moment, yes, most likely. But in general, In the training of AI models. The less we assist, the more we have to rely on some sort of guardrail to ensure alignment.

there will be blood

Burning hardware.

Stacked?? Why did you tell me normal size?

Very scaring.

When AI will consider us as the disease of this planet. It will just have to find a way to eliminate us.

It is like COVID which some laboratories search. A human search, a big pandemic and millions people have died.

Now a human create AI. And also a weapon to destroy us.

It will not be by atomic bomb which could have a "human" security (even if i think deep fake could allow passing throw). But there is so much way to exterminate us on earth... we are more fragile than we think, and a big AI would easily where are our weakness.

Hope real humanity will understand that before it will be too late.

I don't want "terminator" to became a reality (without the need of timeline jump)

Incroyable! il apprend sans avoir besoin de données.

Génial !

Yeah, wow.

Beijing Institute for Artificial Intelligence.

Nice! Although it is not their primary goal, this will be phenomenal in combating hallucinations. I may not have to feed the LLMs the recent documentation for every library updated after training!

So what jumps to mind is letting this thing rip on an FPGA to design an ISA to pair hardware to an llm architecture.

Intelligence is not computational - Roger Penrose

Will be fascinating to see where this goes - like an AlphaZero moment. I wonder if we see math and coding that quickly surpasses the best human, and we actually learn from the AI.

Still seems limited to auto-verifiable domains though. Kind of a relief!

nevent1qvzqqqqqqypzq4mdy0wrmvs9d5sgsj2x9lhrtr8e7renzz3vv09kcfn6fw04sj8eqqst5lwaym4fzl9th5ujn0yvwvyju67063trlu8wh75nsyvukdgkzlggcywvp

Attention is all you need

this is the coolest part of this paper. But the results are still overall terrible. 50%? Every second question will get a wrong answer.

Yesterday it was 0%

Today it's 50%

What will it be tomorrow?

Probably not 100%, but you can see how velocity might be more meaningful than position

Yesterday it was 48%. I mean sure we’ll see, or it’s going to take fundamentally new changes before we make new increases in performance

Yesterday it was 48% with 57k RLHF pairs. Today it's 50% with zero. This is notable because the previous paper tried using less data and found it was necessary:

"Training data from math train 7.5k to Open-Reasoner-

Zero 57k, we observe a consistent increase in both training reward and response length for training and evaluation set, indicating that data scale plays a crucial role in training performance."

Leading to my conclusion that for zero pairs, the previous record was close to 0%. Maybe this isn't strictly true, but I expect it to be more predictive than seeing a 2% change

Right, with respect to how many RLHF pairs, you can compare it to prior result of ~0%. But what I don't understand, and especially with the hype claims the paper makes about "autonomous super-human reasoning", is why can't they just keep running it and get much higher than 50%? Seems like there's another aspect that is preventing getting higher scores, and makes me wonder if these architectures are really just plateauing.

Don't get me wrong, it's some good work; it's just the language of the paper has some ridiculous hype.

Ah. It's true that everyone wants to claim the world. It isn't my work, I'm just plotting points and drawing lines

will not w0rk 4 sUbjective thinking bcoz U need data 4 sUbjective but it still fails after a certain %. .

https://www.youtube.com/watch?v=R76TmU8XMzk

Can't you just post something that doesn't require a download?

And can you please get over the resistance curve fast? I had a lot of problems with it so I understand that part, but I don't think you understand my part yet completely (and I wish for all good people that they do not)

You can just publish on arXiv

The process of Alpha GO Zero applied to LLMs

👁️

Any tips for getting Notebook LM to work well? I’m frequently disappointed with he results; probably a skill issue

There’s a little customization you can do but other than that the result will be unknown.

First time I'm willing to accept the plausibility of AGI assuming this all checks out

Crazy!

I haven’t finished reading it but building super intelligent Ai that surpasses human intelligence feel like weapons against our children. If Ai can do everything, how will all these unborn children survive on their own. How will they reason to create something by themselves that will be useful for their generation. Ai can’t control itself, like it’s manmade and it shouldn’t be more powerful than the maker. Code is also like data and it’s because it’s human that wrote it. If someone can write a meaningful code that solve real problem, we should also give chances to these kids to be reasonable. I’m just thinking about how the following generation will be useless. Does it matter or not?

I think, the part of institution that’s no longer serving should be cut off and train AI to fill the hole. Making AI do everything is a slap on humanity. I will still resume reading later.

(🙏 I’m extremely Sorry for spam but it’s an emergency ❤️‍🩹 #startsmall )

Hello nostr:npub1sg6plzptd64u62a878hep2kev88swjh3tw00gjsfl8f237lmu63q0uf63m sir, I am Sid, a gold artist.

I need your Help Please 🙏

I kindly request your few minutes please on my Primal Note on my profile 🙏

It’s genuinely an emergency for us, From my Job, to my family, our house & endless Medical bills❤️‍🩹

I can share any proof to verify everything,

But please save me & my family, please 🙏 Thank you 🙏

https://primal.net/e/nevent1qqsr4nu58kff3kvkr09awqwpkt2e2xq0sg9ascd4ajc84war7rnmg9gpmtv4h

Looks neat

Dear Jack Dorsey nostr:nprofile1qqsgydql3q4ka27d9wnlrmus4tvkrnc8ftc4h8h5fgyln54gl0a7dgspp4mhxue69uhkummn9ekx7mqpxdmhxue69uhkuamr9ec8y6tdv9kzumn9wshkz7tkdfkx26tvd4urqctvxa4ryur3wsergut9vsch5dmp8pese6nj96 , I Need Borrowing You're Money And I will do whatever I can for you please help me 😄 😄 😙 😙 😚

And what happens when a bad actor programs AI to do something considered “bad”. Then AI trains the next AI to continue the work by using the process posted here.

Things are about to get really interesting.

nevent1qqszxxpddfu82tv59rxhwpf0uent9zkr699svjtcy0f8s9q3czh9n9sppemhxue69uh5qmn0wvhxcmmvnuhfzj

Use a domain shift to formulate alien questions..