LLaVA is a Large Language-and-Vision Assistant aiming to achieve multimodal GPT-4 level capabilities. Developed by Haotian Liu and team, this project is open-sourced on GitHub. LLaVA introduces the concept of visual instruction tuning and builds large language and vision models based on it. The model is trained on various datasets and provides a Gradio Web UI for running the model locally and showcasing its capabilities. If you're interested in AI and machine learning, this project is definitely worth checking out.

https://github.com/haotian-liu/LLaVA

Reply to this note

Please Login to reply.

Discussion

It's now confirmed that it's impossible to use on the V100 GPU, mainly because the V100 doesn't support flash_attn.

https://github.com/haotian-liu/LLaVA/issues/290