i wonder what is the complexity of certain ui interaction patterns

lets consider some basic interation pattern

- go to site xyz, create a new account with name "anon123"

breakdown:

- go to site ---> open firefox

- xyx ----> type xyz into address bar

- create a new account ---> find link or button that says something similar to "new account" or "create account", or a button or link that get closer to that target, such as "getting started", "join"

- after clicking correct sequence of links or buttons, a form is expected. fill name "anon123" to the field that says "name", "username" or similar

- if required fields are marked, those are expected to be filled as well

- click submit

- if error is displayed, correct any fields with error and submit again

(possible email verification or similar procedure here)

- if either success is shown, it is noticed that user is logged in, the task is complete

what can real time object detection model such as yolo learn?

object is typically a shape, but classification as a task does not necessarily require only shape. can classification model classify based on context?

in fact, we should ask, what is "context"? what makes context different from simple shape?

is it simple that if background is blue and there is red circle on it, then we can say that context is blue background but the actual object or shape is the red circle. maybe.

if this is the case, then yolo certainly can learn context based object detection. and why is this a big deal:

if context can be applied to highly performant object detection models, we can perform intelligent ui automation tasks based on these very lightweight models, which can do real time recognition from video frames.

compare this to vlms.

you may not really need a language model to automate ui interaction, if the interaction flow is known well enough. although its partially dependent on whether ui interaction requires highly complex tasks, such as solving complex captcha systems that expect pow level intelligence rather than fast paced action.

#ai #llm #vlm #yolo #automation

Reply to this note

Please Login to reply.

Discussion

No replies yet.