I really hope this turns out to be true. Iām opposed to the idea that āscale is all you needā, rather, I believe that āinnovation / research are all you need.ā
The concern I have is that the scaling strategy can still be applied to multi-pass models, which would likely outperform smaller ones. This not only increases training costs, but also makes inference more expensive due to the need for multiple actions.
That said, Iām not very familiar with these types of architectures, so Iād be happy to read any material youād recommend.