Replying to Avatar Simon Willison

nostr:npub14p3antm8cnvlqx3km7fp4ywyyxq7przhxay0d9gaf6hxkw984cxsm8x6r5 I found that particular paper very unconvincing once I started digging into the data behind it, they were marking answers as "incorrect" for pretty weak reasons in my opinion

nostr:npub14p3antm8cnvlqx3km7fp4ywyyxq7przhxay0d9gaf6hxkw984cxsm8x6r5 generating little jq and bash scripts is an ideal application for untrustworthy LLMs because hallucinated code won't work, so you can spot any hallucination problems pretty fast!

Reply to this note

Please Login to reply.

Discussion

No replies yet.