There are some MCP servers already that should do the work - take an screenshot of a browser and return it

Reply to this note

Please Login to reply.

Discussion

Yea the one i linked does that. I just wonder if it actually helps or not?

Even if it takes a screen shot, wouldn't it be only getting the text ripped out of the image?

My understanding is that the inputs always reduced to a string of tokens. But some feedback would be better than nothing.

You can feed cursor agent images