I came to the same conclusion a while ago. A good way to think about llm is as a lossy archive of text data. You enter a text input as a path, and it extracts data based on that path. The smaller the model, the lerger the loss in data. Too large models will have paths that lead nowhere