Qwen3 LLM: New family excels in coding, math, and general tasks. Offers diverse models, multilingual support, hybrid thinking, and improved agent capabilities, all open-sourced with efficient performance gains.
- Introducing Qwen3: The latest large language model family from Qwen.
- Flagship Model Qwen3-235B-A22B: Achieves competitive performance against top models like DeepSeek-R1, Grok-3, and Gemini-2.5-Pro in coding, math, and general capabilities.
- Efficient Smaller Models: Qwen3-30B-A3B outperforms models with 10x activated parameters; Qwen3-4B rivals Qwen2.5-72B-Instruct.
- Open-Weighted Models: Two MoE models (Qwen3-235B-A22B and Qwen3-30B-A3B) and six dense models (Qwen3-32B, 14B, 8B, 4B, 1.7B, 0.6B) are open-sourced under Apache 2.0.
- Hybrid Thinking Modes: Qwen3 supports both step-by-step "Thinking Mode" for complex problems and rapid "Non-Thinking Mode" for simpler tasks, allowing users to control the "thinking budget".
- Extensive Multilingual Support: Models support 119 languages and dialects.
- Improved Agentic Capabilities: Optimized for coding and agent tasks with enhanced support for MCP (Modular Compound Predicates).
- Massive Pre-training Dataset: Trained on approximately 36 trillion tokens (twice that of Qwen2.5) from web and PDF documents across 119 languages, including synthetic data for math and code.
- Efficient Training Gains: Qwen3 dense base models match or outperform larger Qwen2.5 models while Qwen3-MoE base models achieve similar performance to Qwen2.5 dense models with only 10% active parameters, leading to cost savings.
- Advanced Post-training Pipeline: A four-stage process including long chain-of-thought cold start, reasoning-based RL, thinking mode fusion, and general RL to develop hybrid capabilities.
- Easy Deployment and Local Usage: Available on Hugging Face, ModelScope, and Kaggle. Recommended for deployment with SGLang and vLLM, and local use with Ollama, LMStudio, MLX, llama.cpp, and KTransformers.
- Dynamic Thinking Mode Control: Users can use
/think
and/no_think
tags in multi-turn conversations to dynamically switch thinking modes. - Enhanced Tool Calling: Excels in tool calling capabilities, recommended for use with Qwen-Agent.
- Future Outlook: Focus on scaling data, model size, context length, modalities, and advancing RL with environmental feedback for long-horizon reasoning, moving towards training agents.