"Even though StoryWriter was fine-tuned with a 65k context length, ALiBi makes it possible for the model to extrapolate to even longer inputs than it was trained on: 68k tokens in the case of The Great Gatsby, and up to 84k tokens in our testing"
"MPT-7B looks to be super competitive across the board, even beats 13B models. This LLM is trained on 1T tokens of text and code curated by MosaicML. The model is fine-tuned to also work with a context length of 65k tokens!"
https://twitter.com/hardmaru/status/1654790008925220866?t=_eXP4ZcjdMd_hpPLMdAVZA&s=19
Discussion
No replies yet.