You can run it with multiple or a single GPU, and even with just a CPU.
The lower the compute power, the smaller the model, and the worse the result.
If you want similarish results, you need a lot of compute.
You can run it with multiple or a single GPU, and even with just a CPU.
The lower the compute power, the smaller the model, and the worse the result.
If you want similarish results, you need a lot of compute.
The difference in the training of this model is that is uses a smaller but higher quality testing data set. For example, they include chatgpt conversations in the training data. You can use a more powerful model to train a smaller imitating model.