Step 4 is incorrect
Discussion
Oh yeah, I should look at it more closely. Lesson learned.
I feel like LLMs are pretty bad at this. I'm guessing deepseek or the very recent models would do better; not sure.
I am pretty bad at checking the result…
They're very good at hallucinating results that look right at first glance.
Yeah, it got me.
With some back and forth, it managed 7 steps. It’s interesting to analyze the mistakes it made on the way
https://video.nostr.build/df687880d92d3f5ff42698e5324f946df9dd2c8fa41967478fa35b751eee2609.mp4