There's a common refrain that LLMs are "just predicting the next word". Which is true, this is how we have structured them. But then they go on to explain how this is a fundamental limitation on their potential that will prevent them from progressing beyond where they currently are. This is a common argument, and it tends to be even stronger with people who have worked in AI and "know" how it works.

I appreciate the reasoning. It often seems like people have forgotten how to think critically about the world, so I get excited whenever someone takes a principled approach to an argument. Still, to confuse somethings "what" with its "how" is a mistake. "What" an LLM is doing is to predict the next word. This is what we asked of it. In response, it has "learned" how to output a meaningful answer.

A casual review of LLM research should lead you to work on the "interpretability" of LLMs. Since people didn't program the LLM per-se, we don't have good visibility into why it chose one word over another. Interpretability research tries to uncover the reasoning: "how" the LLM arrived at its answer. This is clear evidence that we don't really know how they work. The learning process has a lot of randomness in it, and the resulting networks aren't logical, even to the people who "produced" them.

So, even if we don't really know "how" they work, does their "what" help us reason about what they're ultimately capable of? I don't think so, because every day there are humans busily taking standardized, fill in the blank tests, and I haven't seen anyone making the argument that they're just "predicting the next word".

#llm #ai #futurist

Reply to this note

Please Login to reply.

Discussion

No replies yet.