Replying to Avatar Simon Willison

nostr:npub14p3antm8cnvlqx3km7fp4ywyyxq7przhxay0d9gaf6hxkw984cxsm8x6r5 Practice! The more time I spend with different models the better my intuition for if they're going to give me a good answer or not

GPT-4 and ChatGPT are far, far more reliable than the models that I can run locally on my laptop

I'm only just beginning to build that intuition for Llama 2, it'll take a while

I spoke about that a bit in https://simonwillison.net/2023/Aug/3/weird-world-of-llms/#tips-for-using-them

nostr:npub14p3antm8cnvlqx3km7fp4ywyyxq7przhxay0d9gaf6hxkw984cxsm8x6r5 I found that particular paper very unconvincing once I started digging into the data behind it, they were marking answers as "incorrect" for pretty weak reasons in my opinion

Reply to this note

Please Login to reply.

Discussion

nostr:npub14p3antm8cnvlqx3km7fp4ywyyxq7przhxay0d9gaf6hxkw984cxsm8x6r5 generating little jq and bash scripts is an ideal application for untrustworthy LLMs because hallucinated code won't work, so you can spot any hallucination problems pretty fast!