Yea the one i linked does that. I just wonder if it actually helps or not?
Discussion
Even if it takes a screen shot, wouldn't it be only getting the text ripped out of the image?
My understanding is that the inputs always reduced to a string of tokens. But some feedback would be better than nothing.
You can feed cursor agent images
I'm sure it can help, people are already feeding screenshots to prompt things to llms ask nostr:nprofile1qqsq6myr3rwtqjdcm48u357ccwae8h3a4y96s28y7zwg458ngeyg5vcpz4mhxue69uhk2er9dchxummnw3ezumrpdejqzyrhwden5te0xy6rqtnxxaazu6t0qy28wumn8ghj7un9d3shjtnyv9kh2uewd9hs4808mz