Also…

“Giant models are slowing us down. In the long run, the best models are the ones

which can be iterated upon quickly. We should make small variants more than an afterthought, now that we know what is possible in the <20B parameter regime.”

This 👆 was my #1 realisation back in 2018 when training our interferometry models.

Monolithic models are grotesquely inefficient. It is orders of magnitude better to break things down into chunks and to produce many specialised models than a monolithic general model. This is what nature does. The key is to create a Monel that can bound a domain of specialisation and then allocate relevant data to the training of a specialist model. I spent $10m to learn this, so not sure why I’m saying it here for free.

Generalisation is a mirage. Once you see the challenge of AGI in better resolution, as people are now starting to, you realise that the best path toward generalisation is to build a high resolution of discrete specialisations. ie by amassing ever more, ever finer specialised models and calling them appropriately, you converge on generalisation. If you try and drive straight towards monolithic generalisation… Mr Thermodynamics wants to see you now.

🍻

Reply to this note

Please Login to reply.

Discussion

No replies yet.