not from LLM based models, no.

RLVR models are the new method, reinforcement learning with verifiable rewards - but then also zero data/zero knowledge based learning.

In other words, they have AI teach AI, become self aware. Reinforced self-play reasoning with zero data. So basically it starts as an SI, iterates, teaches itself based on it's own inputs/outputs, iterates again all without any human inputs (data or prompts instruction)

This new method allows for verified rewards to be the tool that defines the ai reasoning model

Reply to this note

Please Login to reply.

Discussion

Still just an advanced Google search with an LLM at the wheel.

lol no that's very naive

I mean.... I've trained models before but what do I know 🤷🏻

yeah I doubt at the levels these research papers are though?

They are clearly just trying to get funding and squash any competition by fear mongering...

Like it's so obviously clear.