## Model Description
**DeepMath** is a 4B parameter mathematical reasoning model that combines a fine-tuned LLM with a sandboxed Python executor. Built on [Qwen3-4B Thinking](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) and trained with **GRPO (Group Relative Policy Optimization)**, DeepMath generates concise Python snippets for computational steps instead of verbose text explanations, significantly reducing errors and output length.
- **Developed by:** Intel AI Labs
- **Model type:** Causal language model with agent capabilities
- **Language:** English
- **Base model:** [Qwen3-4B Thinking](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)
- **License:** Apache 2.0
- **Blog:**: 🔗
Figure 1: The vLLM client and server were modified to use the DeepMath agent in generating the candidates, while using the vLLM backend.
**Key Findings:**
- **Accuracy:** Improved performance on challenging datasets (AIME, HMMT, HLE)
- **Efficiency:** Up to **66% reduction** in output length
- **Robustness:** Consistent improvements when combining agent + GRPO training
### Evaluation Datasets
- **MATH500:** Subset of the MATH dataset
- **AIME:** American Invitational Mathematics Examination problems
- **HMMT:** Harvard-MIT Mathematics Tournament problems
- **HLE:** High-level exam problems
Figure 2: Example output where Python code is generated, evaluated, and the result is inserted into the reasoning trace.