But it all boils down to the hardware. You need h100s or similar high performance gpus to train and deploy models at that scale, which is still a significant barrier for most companies regardless of their methodologies or frameworks. It’s not the lack of knowledge for the most part but the hardware
Discussion
True, and there are still lots of rumors going around about the actual training hardware for deepseek, but I still think open sourcing a SOTA model is a huge step forward. One of the things the community has been best at is reducing the compute cost for various models.