ListenHub

4-29

Mia: Alright, let's dive into it. I keep hearing buzz about the 'Qwen3 Language Model Family.' Sounds kinda intimidating. Lay it on me… What's the elevator pitch version?

Mars: Okay, so picture Qwen3 as Qwen's revamped, super-charged AI setup. The headliner is Qwen3-235B-A22B, duking it out with the big leagues—like DeepSeek-R1, Grok-3, even Gemini-2.5-Pro. Math problems, coding challenges, just straight-up chatting – it holds its own everywhere.

Mia: So, it’s like… the hot new sports car pulling up next to Ferraris?

Mars: (Laughs) Yeah, pretty much. And under the hood, they’ve got some leaner models too. Like, Qwen3-30B-A3B? It outperforms these behemoths that are like, ten times larger. The 4B version even takes on Qwen2.5’s 72B models.

Mia: Wait a sec. A 4-billion parameter model outperforming something with *72 billion*? That's nuts! How do they even *do* that?

Mars: Part of the magic is MoE, mix-of-expert layers. Think of it like this: You go to a potluck, right? You don’t try *every* dish. You just grab, like, the stuff that looks best. Only a *portion* of the “experts” are activated at any given time. You get crazy performance without burning a ton of resources.

Mia: Ooooh, I like that potluck analogy. So it’s efficient. Is it available for everyone? Or is it locked up?

Mars: Nope! Open source, Apache 2.0 license. There are two MoE versions and six dense models, from 32B down to 0.6B – up on Hugging Face, ModelScope, Kaggle. Plus, you can run them locally with tools like Ollama or llama.cpp.

Mia: Sweet! So, hobbyists and researchers can just grab 'em. I also heard something about Thinking Mode and Non-Thinking Mode? Sounds a little… Sci-fi.

Mars: Right? It’s pretty slick. If you've got a complex task, you trigger step-by-step reasoning—that's Thinking Mode. Quick job? Flip to Non-Thinking Mode, and it'll just blast out an answer. You’re kinda budgeting the AI’s brainpower. Just use `/think` or `/no_think` tags in your chat stream to switch modes on the fly.

Mia: Nice. And what about languages? I code in Python… but what if I needed… Swahili?

Mars: Qwen3 covers 119 languages and dialects. From Urdu to Ugandan dialects, they've got you covered. They trained on roughly 36 trillion tokens — twice Qwen2.5's data. Stuff pulled from the web, PDFs, synthetic code, math sets… Everything they could get their hands on!

Mia: Wow. That's *a lot* of data. Before we wrap up, what's next for Qwen3?

Mars: More data, bigger models, expanded context windows, and adding more modalities like audio and video. And ramping up RL with real-world feedback to train agents for those long-term, complex tasks.

Mia: Awesome. So, Qwen3 is more than just *another* model. It's a whole ecosystem pushing efficiency, openness, and flexible thinking. I'm looking forward to tinkering with it.

Mars: Have fun experimenting.

大纲

Introducing Qwen3: The latest large language model family from Qwen.
Flagship Model Qwen3-235B-A22B: Achieves competitive performance against top models like DeepSeek-R1, Grok-3, and Gemini-2.5-Pro in coding, math, and general capabilities.
Efficient Smaller Models: Qwen3-30B-A3B outperforms models with 10x activated parameters; Qwen3-4B rivals Qwen2.5-72B-Instruct.
Open-Weighted Models: Two MoE models (Qwen3-235B-A22B and Qwen3-30B-A3B) and six dense models (Qwen3-32B, 14B, 8B, 4B, 1.7B, 0.6B) are open-sourced under Apache 2.0.
Hybrid Thinking Modes: Qwen3 supports both step-by-step "Thinking Mode" for complex problems and rapid "Non-Thinking Mode" for simpler tasks, allowing users to control the "thinking budget".
Extensive Multilingual Support: Models support 119 languages and dialects.
Improved Agentic Capabilities: Optimized for coding and agent tasks with enhanced support for MCP (Modular Compound Predicates).
Massive Pre-training Dataset: Trained on approximately 36 trillion tokens (twice that of Qwen2.5) from web and PDF documents across 119 languages, including synthetic data for math and code.
Efficient Training Gains: Qwen3 dense base models match or outperform larger Qwen2.5 models while Qwen3-MoE base models achieve similar performance to Qwen2.5 dense models with only 10% active parameters, leading to cost savings.
Advanced Post-training Pipeline: A four-stage process including long chain-of-thought cold start, reasoning-based RL, thinking mode fusion, and general RL to develop hybrid capabilities.
Easy Deployment and Local Usage: Available on Hugging Face, ModelScope, and Kaggle. Recommended for deployment with SGLang and vLLM, and local use with Ollama, LMStudio, MLX, llama.cpp, and KTransformers.
Dynamic Thinking Mode Control: Users can use /think and /no_think tags in multi-turn conversations to dynamically switch thinking modes.
Enhanced Tool Calling: Excels in tool calling capabilities, recommended for use with Qwen-Agent.
Future Outlook: Focus on scaling data, model size, context length, modalities, and advancing RL with environmental feedback for long-horizon reasoning, moving towards training agents.

脚本