Interesting, I’ll take a look at those- I was thinking it might be good to keep the training deliberately slow- ie train over a week or two to enable the LLM to see globally distributed data and train on that- similar to how Bitcoin keeps blocks at 10 min- what do you think?
Discussion
I think you need to do some entry-level background research into how AI training works.
Sorry that came across as sassy. Your idea is interesting, but I think not really achievable using current training methods.
Have you looked at the training methods used by deepseek for their recent model? It splits the training up via horizontally per node with this DualPipe. From link (https://adasci.org/deepseek-v3-explained-optimizing-efficiency-and-scale/), might work at more distributed scale as it needs close to zero all-to-all communication

I would think it would still run into latency issues, I get that this can be run over multiple nodes at once, most ai training is, but the assumption is that those nodes can rapidly communicate with each other when they need to. But certainly progress is being made on this, hopefully it's only a matter of time until we have decent distributed low-latency training available. If you are training a public AI model, there are petaflops of free computational power available through BOINC and I encourage you to explore it. There is also a cryptocurrency (Gridcoin) which incentivizes participation in BOINC that can be used to instantly get you a ton of free compute volunteers.
Nice will look at it- appreciate the back n forth 🫡