Alibaba Unveils QVQ-72B: Breakthrough Open-Source AI for Visual Reasoning
December 25, 2024The author plans to experiment with running QVQ-72B locally, anticipating it will fit within 64GB of RAM on their M2 laptop.
QVQ-72B-Preview's open-source code and model weights are publicly available, along with a demo hosted on Hugging Face for users to test its capabilities.
Benchmark results indicate that QVQ-72B-Preview significantly outperforms its predecessor, Qwen2-VL, achieving accuracy levels comparable to closed-source models like OpenAI's o1.
The model may struggle with maintaining focus during multi-step visual reasoning, which can lead to inaccuracies in its outputs.
Alibaba's AI research team, Qwen, has launched QVQ-72B-Preview, an open-source model specifically designed for visual reasoning tasks.
A recent article highlights the capabilities of QVQ-72B, a cutting-edge visual reasoning large language model developed by the Qwen team.
Titled 'QVQ-72B: The Ultimate Visual Reasoning AI You Can Run Locally', the article discusses the model's features and its potential applications.
Despite its advancements, the model has limitations, including unexpected language switching and occasional hallucinations during complex tasks.
Testing revealed the model's ability to accurately count objects in images, such as identifying four pelicans, showcasing its practical application in visual reasoning.
The model scored 70.3% on the Multimodal Massive Multi-task Understanding (MMMU) benchmark, demonstrating strong performance in complex tasks.
This model allows users to run high-performance AI tasks locally, enhancing data privacy and eliminating reliance on cloud services.
QVQ-72B features 72 billion parameters and integrates visual processing with text comprehension, enabling it to tackle complex queries effectively.
Summary based on 7 sources
Get a daily email with more Tech stories
Sources
GIGAZINE • Dec 25, 2024
Alibaba's Qwen team releases 'QVQ', an open-source AI model that can recognize images• Dec 24, 2024
Trying out QvQ—Qwen’s new visual reasoning modelGadgets 360 • Dec 26, 2024
Alibaba Releases Another AI Model, This One Specialises in VisionTHE DECODER • Dec 25, 2024
Qwen's open-source QVQ rivals OpenAI and Google's best models in visual reasoning