| Kimi K2.7 Code |
Not exposed in the OpenRouter model metadata at report time. |
Cloudflare Workers AI docs list 1T total parameters, 262.1k context,
vision input, tool calling, reasoning, and structured outputs. Kimi's release note reports
+21.8% on Kimi Code Bench v2 over K2.6, but that is a vendor-family delta rather than a
cross-model benchmark.
|
Newest and transparent on parameter scale. Benchmark coverage is thinner than Gemini/MiniMax in the public metadata.
|
| MiniMax M3 |
Intelligence 54.7
Coding 43.4
Agentic 68.6
|
MiniMax reports SWE-Bench Pro 59.0%, Terminal-Bench 2.1 66.0%, SWE-fficiency 34.8%,
KernelBench Hard 28.8%, and MCP Atlas 74.2%.
|
Best cost/context profile here, and the OpenRouter agentic index is better than Qwen Max's 66.6 baseline.
|
| Gemini 3 Flash Preview |
Intelligence 46.4
Coding 42.6
Agentic 49.7
|
Google's Gemini 3 developer guide lists 1M / 64k context and $0.50 / $3 pricing.
The Gemini 3.5 model card gives Gemini 3 Flash baseline scores including Terminal-bench 58.0%,
SWE-Bench Pro 49.6%, MCP Atlas 62.0%, CharXiv Reasoning 80.3%, and MMMU-Pro 81.2%.
|
Capable and cheap, but mostly superseded by 3.5 Flash for this shortlist.
|
| Gemini 3.5 Flash |
Intelligence 55.3
Coding 45.0
Agentic 70.3
|
Google model card reports Terminal-bench 2.1 76.2%, SWE-Bench Pro 55.1%, MCP Atlas 83.6%,
OSWorld-Verified 78.4%, CharXiv Reasoning 84.2%, and MMMU-Pro 83.6%.
|
Best quality signal of the four. The tradeoff is price: output is 7.5x MiniMax M3 and 2.57x Kimi K2.7 Code.
|