Microsoft's new Phi-4-reasoning is in line with controversial Deepseek-R1's performance
These models are trained in English and have 32k token context window
2 min. read
Published on
Read our disclosure page to find out how can you help Windows Report sustain the editorial team. Read more
Microsoft has expanded its Phi AI model family with the release of two new models, Phi-4-reasoning and Phi-4-reasoning-plus. Both reasoning models are on a 14-billion-parameter architecture for reasoning-heavy tasks.
Microsoft announces Phi-4-Reasoning, Phi-4-Reasoning-plus, and Phi-4-mini-reasoning models
Microsoft says that it has trained the core Phi-4-reasoning model with a curated set of “teachable” prompts generated by its own o3-mini model. Per the company’s whitepaper, these models outperform larger open-weight alternatives like DeepSeek-R1 Distill-Llama 70 B.
On several key reasoning benchmarks, this Phi-4 model even beats the full DeepSeek-R1 model, despite its larger size. Benchmark performance comparison attached below also shows that Microsoft’s Phi reasoning models are ahead of some popular models from its rivals.
The company notes that the new models outperform Anthropic’s Claude 3.7 Sonnet and Google’s Gemini 2 Flash Thinking in most tasks. However, it lags in GPQA and Calendar Planning.
The enhanced Phi-4-reasoning-plus model builds on top of the core Phi-4-reasoning model. Microsoft adds that these new releases support reasoning-intensive applications. They are useful particularly in settings where memory, compute, or latency constraints are a concern.
Also read: Microsoft reports strong FY25 Q3 earnings driven by cloud and AI growth
Microsoft also introduced the Phi-4-mini-reasoning model, designed to meet the demand for a compact reasoning model. The company in the announcement blog notes, “This transformer-based language model is optimized for mathematical reasoning, providing high-quality, step-by-step problem solving in environments with constrained computing or latency.”
Limitations
Despite their capabilities, the Microsoft Phi-4-reasoning models come with certain limitations. They are trained in English, focusing heavily on Python and standard coding packages, and have a 32k token context window.
Also read: Microsoft’s CEO says AI has written 20% to 30% of the company’s code
These reasoning-focused models are part of Microsoft’s effort to advance scalable and efficient language model research. As highlighted in a tweet by researcher Ahmed Awadallah, the models represent a step forward in combining supervised learning and reinforcement learning to enhance reasoning benchmarks.
User forum
0 messages