For me, this is all fine so far. The problem is that, on the one hand, we cannot access the probabilities (or logits) associated with the chosen tokens in non open-source models. A model may be uncertain about an answer and still produce an incorrect response.
It is crucial to have access to the level of confidence behind a model’s answers. Somehow, the uncertainty associated with an output needs to be quantified, and the user should be made aware of it.