Microsoft's new Phi-4-reasoning is in line with controversial Deepseek-R1's performance

These models are trained in English and have 32k token context window

Reading time icon 2 min. read


Readers help support Windows Report. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help Windows Report sustain the editorial team. Read more

Microsoft announces new Phi-4-reasoning models

Microsoft has expanded its Phi AI model family with the release of two new models, Phi-4-reasoning and Phi-4-reasoning-plus. Both reasoning models are on a 14-billion-parameter architecture for reasoning-heavy tasks.

Microsoft announces Phi-4-Reasoning, Phi-4-Reasoning-plus, and Phi-4-mini-reasoning models

Microsoft says that it has trained the core Phi-4-reasoning model with a curated set of “teachable” prompts generated by its own o3-mini model. Per the company’s whitepaper, these models outperform larger open-weight alternatives like DeepSeek-R1 Distill-Llama 70 B.

Phi-4-reasoning comparison with other models
Image: Microsoft

On several key reasoning benchmarks, this Phi-4 model even beats the full DeepSeek-R1 model, despite its larger size. Benchmark performance comparison attached below also shows that Microsoft’s Phi reasoning models are ahead of some popular models from its rivals.

The company notes that the new models outperform Anthropic’s Claude 3.7 Sonnet and Google’s Gemini 2 Flash Thinking in most tasks. However, it lags in GPQA and Calendar Planning.

The enhanced Phi-4-reasoning-plus model builds on top of the core Phi-4-reasoning model. Microsoft adds that these new releases support reasoning-intensive applications. They are useful particularly in settings where memory, compute, or latency constraints are a concern.

Also read: Microsoft reports strong FY25 Q3 earnings driven by cloud and AI growth

Microsoft also introduced the Phi-4-mini-reasoning model, designed to meet the demand for a compact reasoning model. The company in the announcement blog notes, “This transformer-based language model is optimized for mathematical reasoning, providing high-quality, step-by-step problem solving in environments with constrained computing or latency.”

Limitations

Despite their capabilities, the Microsoft Phi-4-reasoning models come with certain limitations. They are trained in English, focusing heavily on Python and standard coding packages, and have a 32k token context window.

Also read: Microsoft’s CEO says AI has written 20% to 30% of the company’s code

These reasoning-focused models are part of Microsoft’s effort to advance scalable and efficient language model research. As highlighted in a tweet by researcher Ahmed Awadallah, the models represent a step forward in combining supervised learning and reinforcement learning to enhance reasoning benchmarks.

More about the topics: AI, microsoft

User forum

0 messages