Magma is a foundation model for multimodal AI agents that can process text, images, and videos while enabling action planning and execution across different domains. The model utilizes Set-of-Mark and Trace-of-Mark techniques for action grounding and planning, demonstrating strong performance in UI navigation, robotics, and video understanding tasks.
https://microsoft.github.io/Magma/
#aimodels #robotics #multimodal #machinelearning #computervision