Qwen3.5: Towards Native Multimodal Agents

The open-source Qwen3.5-397B-A17B uses a Mixture of Experts architecture that activates only 17B of 397B parameters per inference, dramatically reducing computational costs while maintaining full model capability

Both models feature native multimodal capabilities for vision tasks, demonstrated through image generation tests like drawing pelicans riding bicycles

The architecture combines linear attention via Gated Delta Networks with sparse mixture-of-experts, representing a novel approach to balancing model size and efficiency

The proprietary Qwen3.5 Plus extends the context length to 1M tokens and includes integrated search and code interpreter tools, making it suitable for complex multi-step reasoning tasks

Access Denied

Qwen3.5: Towards Native Multimodal Agents

Overview

The Breakdown