You can run it with multiple or a single GPU, and even with just a CPU.

The lower the compute power, the smaller the model, and the worse the result.

If you want similarish results, you need a lot of compute.

Reply to this note

Please Login to reply.

Discussion

The difference in the training of this model is that is uses a smaller but higher quality testing data set. For example, they include chatgpt conversations in the training data. You can use a more powerful model to train a smaller imitating model.