True, but ironically, the Dunning-Kruger hypothesis proves that we humans do exactly the same thing 😂

I'm starting to think that these massive, one pass, tokenised models will become redundant very quickly.

We are already seeing small, multi-pass models beating the large parameter models by simply iterating their thinking.

Reply to this note

Please Login to reply.

Discussion

I really hope this turns out to be true. I’m opposed to the idea that “scale is all you need”, rather, I believe that “innovation / research are all you need.”

The concern I have is that the scaling strategy can still be applied to multi-pass models, which would likely outperform smaller ones. This not only increases training costs, but also makes inference more expensive due to the need for multiple actions.

That said, I’m not very familiar with these types of architectures, so I’d be happy to read any material you’d recommend.