Magma is a foundation model for multimodal AI agents that can process text, images, and videos while enabling action planning and execution across different domains. The model utilizes Set-of-Mark and Trace-of-Mark techniques for action grounding and planning, demonstrating strong performance in UI navigation, robotics, and video understanding tasks.

https://microsoft.github.io/Magma/

#aimodels #robotics #multimodal #machinelearning #computervision

Reply to this note

Please Login to reply.

Discussion

No replies yet.