The parameter count mostly encodes how much information it can memorize. They are mostly breadth first. You can think of it like a Fourier series to reconstruct a signal. With few parameters you the whole image but smoothed out. The problem with LLMs is that the level of knowledge they have is independent of the level of specificity with which they answer.
This is where hallucinations come from. If what you ask requires a higher resolution that their parameter count allows the they fabricate the remaining detail.
A human brain is probably about 100 Trillion parameters, so none of it is going to be super impressive. 70 billion parameters is the minimum that I have found is useful for actual discussions. At that level the hallucinations are no longer a constant thing in general conversation, but still exist for specifics. Smaller models are progressively more useless. When you get to single digit billions you have coherent but not informative. Best used for restructuring specific text or generating entertaining output.
By the way the 70billion cut-off is for a monolithic model. Mixture of experts is almost always crap. For instance llama 4 nearly always underperforms llama 3.3 even though it 50% more parameters. The best I can say is that it does run faster.