For me, this is all fine so far. The problem is that, on the one hand, we cannot access the probabilities (or logits) associated with the chosen tokens in non open-source models. A model may be uncertain about an answer and still produce an incorrect response.

It is crucial to have access to the level of confidence behind a model’s answers. Somehow, the uncertainty associated with an output needs to be quantified, and the user should be made aware of it.

Reply to this note

Please Login to reply.

Discussion

True, but ironically, the Dunning-Kruger hypothesis proves that we humans do exactly the same thing 😂

I'm starting to think that these massive, one pass, tokenised models will become redundant very quickly.

We are already seeing small, multi-pass models beating the large parameter models by simply iterating their thinking.

I really hope this turns out to be true. I’m opposed to the idea that “scale is all you need”, rather, I believe that “innovation / research are all you need.”

The concern I have is that the scaling strategy can still be applied to multi-pass models, which would likely outperform smaller ones. This not only increases training costs, but also makes inference more expensive due to the need for multiple actions.

That said, I’m not very familiar with these types of architectures, so I’d be happy to read any material you’d recommend.