The Pile: An 800GB dataset of diverse text for language modeling (2020)
Link: https://arxiv.org/abs/2101.00027
Discussion: https://news.ycombinator.com/item?id=36685115
The Pile: An 800GB dataset of diverse text for language modeling (2020)
Link: https://arxiv.org/abs/2101.00027
Discussion: https://news.ycombinator.com/item?id=36685115
No replies yet.