I would think it would still run into latency issues, I get that this can be run over multiple nodes at once, most ai training is, but the assumption is that those nodes can rapidly communicate with each other when they need to. But certainly progress is being made on this, hopefully it's only a matter of time until we have decent distributed low-latency training available. If you are training a public AI model, there are petaflops of free computational power available through BOINC and I encourage you to explore it. There is also a cryptocurrency (Gridcoin) which incentivizes participation in BOINC that can be used to instantly get you a ton of free compute volunteers.
Have you looked at the training methods used by deepseek for their recent model? It splits the training up via horizontally per node with this DualPipe. From link (https://adasci.org/deepseek-v3-explained-optimizing-efficiency-and-scale/), might work at more distributed scale as it needs close to zero all-to-all communication

Discussion
Nice will look at it- appreciate the back n forth 🫡