nanoGPT speedrun: Nice work from @kellerjordan0 adapting the nanoGPT/llmc PyTorch training code into a benchmark training a 124M Transformer to a fixed validation loss target. Current SOTA is 3.8X more token-efficient training (2.7B vs. 10B tokens)

Source: x.com/karpathy/status/1846790537262571739

Reply to this note

Please Login to reply.

Discussion

No replies yet.