There are some MCP servers already that should do the work - take an screenshot of a browser and return it
Whenever I vibecode UIs it feels like the agent is flying blind because it can't see what's being rendered.
Have any of you tried integrating a browser-automation MCP like this one? Seems like this could really help the agent QA its work.
https://github.com/modelcontextprotocol/servers/tree/main/src/puppeteer
Discussion
Yea the one i linked does that. I just wonder if it actually helps or not?
Even if it takes a screen shot, wouldn't it be only getting the text ripped out of the image?
My understanding is that the inputs always reduced to a string of tokens. But some feedback would be better than nothing.
You can feed cursor agent images
I'm sure it can help, people are already feeding screenshots to prompt things to llms ask nostr:nprofile1qqsq6myr3rwtqjdcm48u357ccwae8h3a4y96s28y7zwg458ngeyg5vcpz4mhxue69uhk2er9dchxummnw3ezumrpdejqzyrhwden5te0xy6rqtnxxaazu6t0qy28wumn8ghj7un9d3shjtnyv9kh2uewd9hs4808mz