- cross-posted to:
- technology@hexbear.net
- technology
- hackernews@lemmy.bestiver.se
- cross-posted to:
- technology@hexbear.net
- technology
- hackernews@lemmy.bestiver.se
You must log in or # to comment.
We evaluated Devstral 2 against DeepSeek V3.2 and Claude Sonnet 4.5 using human evaluations conducted by an independent annotation provider, with tasks scaffolded through Cline. Devstral 2 shows a clear advantage over DeepSeek V3.2, with a 42.8% win rate versus 28.6% loss rate. However, Claude Sonnet 4.5 remains significantly preferred, indicating a gap with closed-source models persists.
Thank you for being honest about performance



