MiMo-7B, a 7B model series, excels in reasoning via optimized pre and post-training, matching OpenAI o1-mini in math/code. It uses rule-based RL and MTP for enhanced performance.
Unlocking the Reasoning Potential of Language Models: MiMo-7B
I. Introduction
- MiMo-7B: A series of 7B models trained from scratch specifically for reasoning tasks.
- Outperforms much larger 32B models in reasoning.
- Achieves performance matching OpenAI o1-mini on math and code reasoning.
- Focus on both pre-training and post-training strategies tailored to reasoning.
🌟 Highlights
- Pre-Training: Base Model Born for Reasoning
- Optimized data preprocessing pipeline to increase reasoning pattern density.
- Employs multi-dimensional data filtering.
- Uses multiple strategies to generate diverse synthetic reasoning data.
- Three-stage data mixture strategy.
- Trained on approximately 25 trillion tokens.
- Incorporates Multiple-Token Prediction (MTP) to enhance performance and accelerate inference.
- Post-Training Recipe: Pioneering Reasoning Model
- 130K math and code problems as RL training data, verified by rule-based verifiers.
- Uses only rule-based accuracy rewards to avoid reward hacking.
- Introduces a test difficulty driven code reward to mitigate sparse reward issue for challenging code problems.
- Data re-sampling strategy for easy problems to enhance rollout sampling efficiency and stabilize policy updates.
- RL Infrastructures
- Developed a Seamless Rollout Engine to accelerate RL training and validation (2.29x faster training, 1.96x faster validation).
- Supports MTP in vLLM and enhances the robustness of the inference engine in the RL system.
II. Model Details
- Models available at: https://huggingface.co/XiaomiMiMo
Model | Description | Download |
---|---|---|
MiMo-7B-Base | Base model with extraordinary reasoning potential | 🤗 XiaomiMiMo/MiMo-7B-Base |
MiMo-7B-RL-Zero | RL model trained from base model | 🤗 XiaomiMiMo/MiMo-7B-RL-Zero |
MiMo-7B-SFT | SFT model trained from base model | 🤗 XiaomiMiMo/MiMo-7B-SFT |
MiMo-7B-RL | RL model trained from SFT model, matching OpenAI o1-mini | 🤗 XiaomiMiMo/MiMo-7B-RL |
III. Evaluation Results
- MiMo-7B-RL achieves strong performance, competing with larger models and even matching OpenAI's o1-mini.
Benchmark | MiMo-7B-RL |
---|---|
Mathematics | |
MATH500 (Pass@1) | 95.8 |
AIME 2024 (Pass@1) | 68.2 |
AIME 2025 (Pass@1) | 55.4 |
Code | |
LiveCodeBench v5 | 57.8 |
LiveCodeBench v6 | 49.3 |
IV. Deployment
- Recommended to use Xiaomi's fork of vLLM.
- Recommended to use empty system prompt