is online interaction more of an pow intelligence, or quick ant like action oriented behaviour? which is closer in describing interaction patterns done in different interfaces?
i would say both, but at what portion? at least one can say, if someone writes a piece of text, a yolo model cannot do this, but a language model is required. however, what if its possible to combine very performant yolo model with a slow language model?
a coordination system would be required. can a lightweight coordination system decide when we can just go with the low effort action model, and when we need to use more complex model for reading or writing text?
if lot of online interaction can be thought as analogue to moving legs to walk, or moving hands to grab something, i think these interactions can be done with relatively lightweight models, compared to full blown pow models called large language models.
maybe it is enough that the ui interaction model is on the level of a rat, while the more intelligent large language model then handles more complex operation, such as reading or writing tasks.
using this separation, if the theory has any basis, it could be possible to build ui automation agents that work in real time, faster than humans on basic interaction flows, while still retaining their ability to think slow when needed.
#ai #llm #vlm #yolo #automation