Alibaba presents new AI model with advancing multimodal capabilities

Chinese tech giant Alibaba unveiled a new multimodal artificial intelligence (AI) model capable of processing and generating responses from various types of media, including text, images, audio, and video, the Hong Kong-based South China Morning Post reported.
The company introduced Qwen2.5-Omni-7B on Thursday as the newest member of its Qwen model family as the tech giant aims to strengthen its position in the generative AI field. The multimodal Qwen2.5-Omni-7B model brings advanced AI capabilities closer to everyday users.

According to a statement from Alibaba, the model can process various types of inputs and generate real-time responses in text or audio. Additionally, the company has made the model open-source.
Qwen gains widespread recognition around the world
The company emphasized potential use cases like offering real-time audio descriptions for visually impaired users and providing step-by-step cooking instructions by analyzing ingredients.
Alibaba’s foundational Qwen models have become popular choices for AI developers to build on, positioning them as one of the few major alternatives to DeepSeek‘s V3 and R1 models in China.
Alibaba introduced the Qwen2.5 model September 2024 and followed up with the release of Qwen2.5-Max in January. The Qwen2.5-Max model quickly gained recognition, ranking 7th on Chatbot Arena, a platform known for evaluating large language models (LLMs).