MiMo-7B Reasoning Language Model Outperforms

MiMo-7B, a 7B model series, excels in reasoning via optimized pre and post-training, matching OpenAI o1-mini in math/code. It uses rule-based RL and MTP for enhanced performance.

Unlocking the Reasoning Potential of Language Models: MiMo-7B

I. Introduction

MiMo-7B: A series of 7B models trained from scratch specifically for reasoning tasks.
Outperforms much larger 32B models in reasoning.
Achieves performance matching OpenAI o1-mini on math and code reasoning.
Focus on both pre-training and post-training strategies tailored to reasoning.

🌟 Highlights

Pre-Training: Base Model Born for Reasoning
- Optimized data preprocessing pipeline to increase reasoning pattern density.
- Employs multi-dimensional data filtering.
- Uses multiple strategies to generate diverse synthetic reasoning data.
- Three-stage data mixture strategy.
- Trained on approximately 25 trillion tokens.
- Incorporates Multiple-Token Prediction (MTP) to enhance performance and accelerate inference.
Post-Training Recipe: Pioneering Reasoning Model
- 130K math and code problems as RL training data, verified by rule-based verifiers.
- Uses only rule-based accuracy rewards to avoid reward hacking.
- Introduces a test difficulty driven code reward to mitigate sparse reward issue for challenging code problems.
- Data re-sampling strategy for easy problems to enhance rollout sampling efficiency and stabilize policy updates.
RL Infrastructures
- Developed a Seamless Rollout Engine to accelerate RL training and validation (2.29x faster training, 1.96x faster validation).
- Supports MTP in vLLM and enhances the robustness of the inference engine in the RL system.

II. Model Details

Models available at: https://huggingface.co/XiaomiMiMo

Model	Description	Download
MiMo-7B-Base	Base model with extraordinary reasoning potential	🤗 XiaomiMiMo/MiMo-7B-Base
MiMo-7B-RL-Zero	RL model trained from base model	🤗 XiaomiMiMo/MiMo-7B-RL-Zero
MiMo-7B-SFT	SFT model trained from base model	🤗 XiaomiMiMo/MiMo-7B-SFT
MiMo-7B-RL	RL model trained from SFT model, matching OpenAI o1-mini	🤗 XiaomiMiMo/MiMo-7B-RL

III. Evaluation Results

MiMo-7B-RL achieves strong performance, competing with larger models and even matching OpenAI's o1-mini.

Benchmark	MiMo-7B-RL
Mathematics
MATH500 (Pass@1)	95.8
AIME 2024 (Pass@1)	68.2
AIME 2025 (Pass@1)	55.4
Code
LiveCodeBench v5	57.8
LiveCodeBench v6	49.3

IV. Deployment

Recommended to use Xiaomi's fork of vLLM.
Recommended to use empty system prompt