Arcada Labs Code Categories Arena 代码能力排行榜
基于 Arcada Labs Code Categories Arena 用户匿名投票的最新AI大模型代码能力排行榜,通过 Bradley-Terry 模型对 Website、UI Component、Game Dev、Data Visualization 等代码子类别进行综合评分与排名。
榜首模型
Claude Opus 4.6
最高得分
1346.00
模型数量
127
数据版本
2026年05月31日
数据来源: Arcada Labs
排名总表
| 排名 | 模型名称 | 得分 | 95% CI | 投票数 | 机构 | 许可证 |
|---|---|---|---|---|---|---|
Claude Opus 4.6Anthropic | 1346.00 | — | 16,089 | Anthropic | Proprietary | |
Claude Opus 4.7 (Thinking)Anthropic | 1344.00 | — | 7,755 | Anthropic | Proprietary | |
Claude Opus 4.6 (Thinking)Anthropic | 1341.00 | — | 13,540 | Anthropic | Proprietary | |
| 4 | Kimi K2.6Moonshot AI | 1337.00 | — | 15,535 | Moonshot AI | Open Source |
| 5 | GLM 5.1Zhipu AI | 1336.00 | — | 5,197 | Zhipu AI | Open Source |
| 6 | Opus 4.7Anthropic | 1330.00 | — | 11,025 | Anthropic | Proprietary |
| 7 | Claude Sonnet 4.6Anthropic | 1329.00 | — | 15,336 | Anthropic | Proprietary |
| 8 | GLM 5 TurboZhipu AI | 1329.00 | — | 14,085 | Zhipu AI | Proprietary |
| 9 | MiMo-V2.5-ProXiaomi | 1327.00 | — | 3,587 | Xiaomi | Open Source |
| 10 | Qwen3.7 MaxAlibaba | 1314.00 | — | 7,534 | Alibaba | Proprietary |
| 11 | MiMo-V2.5Xiaomi | 1309.00 | — | 15,671 | Xiaomi | Open Source |
| 12 | Muse SparkFacebook AI研究实验室 | 1307.00 | — | 4,248 | Facebook AI研究实验室 | Proprietary |
| 13 | DeepSeek-V4-ProDeepSeek | 1306.00 | — | 9,410 | DeepSeek | Open Source |
| 14 | Gemini 3.5 FlashGoogle | 1302.00 | — | 6,073 | Proprietary | |
| 15 | GLM 5Zhipu AI | 1302.00 | — | 30,971 | Zhipu AI | Open Source |
| 16 | GPT-5.5OpenAI | 1302.00 | — | 8,045 | OpenAI | Proprietary |
| 17 | Opus 4.5Anthropic | 1296.00 | — | 28,169 | Anthropic | Proprietary |
| 18 | Gemini 3.1 Pro PreviewGoogle Deep Mind | 1296.00 | — | 23,948 | Google Deep Mind | Proprietary |
| 19 | Kimi K2.5 (Thinking)Moonshot AI | 1294.00 | — | 30,129 | Moonshot AI | Open Source |
| 20 | MiniMax M2.7MiniMax | 1286.00 | — | 24,347 | MiniMax | Open Source |
| 21 | GLM-5V-Turbo智谱AI | 1286.00 | — | 19,033 | 智谱AI | Proprietary |
| 22 | Gemini 3.1 Pro PreviewGoogle Deep Mind | 1283.00 | — | 25,876 | Google Deep Mind | Proprietary |
| 23 | Qwen 3.6 Plus Preview阿里巴巴 | 1283.00 | — | 16,861 | 阿里巴巴 | Proprietary |
| 24 | Claude Opus 4.8Anthropic | 1282.00 | — | 6,131 | Anthropic | Proprietary |
| 25 | GLM 4.7Zhipu AI | 1275.00 | — | 38,816 | Zhipu AI | Open Source |
| 26 | 1272.00 | — | 17,718 | xAI | Proprietary | |
| 27 | DeepSeek-V4-FlashDeepSeek | 1270.00 | — | 15,684 | DeepSeek | Open Source |
| 28 | GPT-5.4 (Design Skill, Medium)OpenAI | 1269.00 | — | 6,369 | OpenAI | Proprietary |
| 29 | GPT-5.4 (Medium)OpenAI | 1267.00 | — | 13,383 | OpenAI | Proprietary |
| 30 | MiniMax M2.5MiniMax | 1262.00 | — | 11,504 | MiniMax | Open Source |
| 31 | 1262.00 | — | 12,334 | xAI | Proprietary | |
| 32 | 1253.00 | — | 18,535 | xAI | Proprietary | |
| 33 | Gemini 3 Flash PreviewGoogle | 1245.00 | — | 4,446 | Proprietary | |
| 34 | MiniMax M2.1MiniMax | 1245.00 | — | 20,892 | MiniMax | Open Source |
| 35 | Claude Sonnet 4.5 (Thinking)Anthropic | 1238.00 | — | 32,348 | Anthropic | Proprietary |
| 36 | Claude Sonnet 4.5Anthropic | 1237.00 | — | 33,156 | Anthropic | Proprietary |
| 37 | Qwen3.5-397B-A17B阿里巴巴 | 1235.00 | — | 8,129 | 阿里巴巴 | Open Source |
| 38 | GPT-5.4 (Low)OpenAI | 1234.00 | — | 14,824 | OpenAI | Proprietary |
| 39 | GPT-5.4 (None)OpenAI | 1234.00 | — | 16,608 | OpenAI | Proprietary |
| 40 | GLM 4.7 FlashZhipu AI | 1233.00 | — | 11,706 | Zhipu AI | Open Source |
| 41 | Claude 3.7 SonnetAnthropic | 1232.00 | — | 15,317 | Anthropic | Proprietary |
| 42 | DeepSeek-V3.1 (Thinking)DeepSeek | 1231.00 | — | 16,327 | DeepSeek | Open Source |
| 43 | Claude Opus 4.1 (Thinking)Anthropic | 1226.00 | — | 15,778 | Anthropic | Proprietary |
| 44 | DeepSeek V3.2-ExpDeepSeek-AI | 1226.00 | — | 19,549 | DeepSeek-AI | Open Source |
| 45 | GPT-5.1 (high)OpenAI | 1226.00 | — | 16,146 | OpenAI | Proprietary |
| 46 | GPT-5.2 (None)OpenAI | 1225.00 | — | 24,334 | OpenAI | Proprietary |
| 47 | GPT-5.2 (medium)OpenAI | 1225.00 | — | 23,098 | OpenAI | Proprietary |
| 48 | GPT-5 (high)OpenAI | 1224.00 | — | 13,476 | OpenAI | Proprietary |
| 49 | Qwen3.5 Plus 02-15Alibaba | 1223.00 | — | 17,272 | Alibaba | Proprietary |
| 50 | DeepSeek V3.2DeepSeek-AI | 1222.00 | — | 24,178 | DeepSeek-AI | Open Source |
| 51 | GPT-5.2 (Low)OpenAI | 1222.00 | — | 24,599 | OpenAI | Proprietary |
| 52 | Claude Opus 4.1Anthropic | 1221.00 | — | 32,495 | Anthropic | Proprietary |
| 53 | GLM 4.6Zhipu AI | 1221.00 | — | 16,997 | Zhipu AI | Open Source |
| 54 | GLM 4.5Zhipu AI | 1220.00 | — | 19,727 | Zhipu AI | Open Source |
| 55 | GPT-5 (Minimal)OpenAI | 1219.00 | — | 31,838 | OpenAI | Proprietary |
| 56 | GPT-5.1 (Medium)OpenAI | 1217.00 | — | 21,393 | OpenAI | Proprietary |
| 57 | Claude Opus 4Anthropic | 1216.00 | — | 16,750 | Anthropic | Proprietary |
| 58 | Step 3.7 FlashStepFun | 1216.00 | — | 3,137 | StepFun | Open Source |
| 59 | GPT-5.1 (Low)OpenAI | 1211.00 | — | 22,262 | OpenAI | Proprietary |
| 60 | MiMo-V2-FlashXiaomi | 1211.00 | — | 32,252 | Xiaomi | Open Source |
| 61 | Gemini 2.5-ProGoogle Deep Mind | 1209.00 | — | 7,044 | Google Deep Mind | Proprietary |
| 62 | GPT-5.1 CodexOpenAI | 1206.00 | — | 1,807 | OpenAI | Proprietary |
| 63 | GPT-5.1 (None)OpenAI | 1206.00 | — | 22,399 | OpenAI | Proprietary |
| 64 | GPT-5.2 (High)OpenAI | 1205.00 | — | 4,167 | OpenAI | Proprietary |
| 65 | GPT-5.3 CodexOpenAI | 1200.00 | — | 15,763 | OpenAI | Proprietary |
| 66 | Qwen3 Coder 480B A35B InstructAlibaba | 1198.00 | — | 1,958 | Alibaba | Open Source |
| 67 | Claude Sonnet 4Anthropic | 1197.00 | — | 17,619 | Anthropic | Proprietary |
| 68 | Mistral Large 3 (2512)Mistral | 1197.00 | — | 29,272 | Mistral | Open Source |
| 69 | DeepSeek-R1-0528DeepSeek-AI | 1194.00 | — | 18,052 | DeepSeek-AI | Open Source |
| 70 | GLM 4.5 AirZhipu AI | 1193.00 | — | 17,361 | Zhipu AI | Open Source |
| 71 | Claude Sonnet 4 (Thinking)Anthropic | 1192.00 | — | 16,301 | Anthropic | Proprietary |
| 72 | MiniMax M2 StableMiniMax | 1190.00 | — | 10,933 | MiniMax | Open Source |
| 73 | AesCoder-4BDesignFlow | 1182.00 | — | 37,423 | DesignFlow | Open Source |
| 74 | Mistral Medium 3.5Mistral | 1178.00 | — | 8,390 | Mistral | Open Source |
| 75 | Mistral Medium 3.1 (2508)Mistral | 1176.00 | — | 26,826 | Mistral | Proprietary |
| 76 | Trinity Large ThinkingArcee AI | 1174.00 | — | 10,557 | Arcee AI | Open Source |
| 77 | GPT-5 mini (Default)OpenAI | 1171.00 | — | 31,116 | OpenAI | Proprietary |
| 78 | Claude Haiku 4.5Anthropic | 1170.00 | — | 34,519 | Anthropic | Proprietary |
| 79 | DeepSeek-V3.1DeepSeek-AI | 1167.00 | — | 20,375 | DeepSeek-AI | Open Source |
| 80 | Qwen3 MaxAlibaba | 1167.00 | — | 32,079 | Alibaba | Proprietary |
| 81 | DeepSeek-V3-0324DeepSeek-AI | 1163.00 | — | 19,366 | DeepSeek-AI | Open Source |
| 82 | Prime Intellect: INTELLECT-3Prime Intellect | 1162.00 | — | 29,267 | Prime Intellect | Open Source |
| 83 | Gemini 2.5 Flash Preview 09-2025Google | 1159.00 | — | 19,439 | Proprietary | |
| 84 | Kimi K2 0905 PreviewMoonshot AI | 1153.00 | — | 1,504 | Moonshot AI | Open Source |
| 85 | GPT-5.1 Codex MiniOpenAI | 1151.00 | — | 31,457 | OpenAI | Proprietary |
| 86 | 1151.00 | — | 35,534 | xAI | Proprietary | |
| 87 | 1148.00 | — | 34,086 | xAI | Proprietary | |
| 88 | 1144.00 | — | 31,697 | xAI | Proprietary | |
| 89 | GPT-5 nano (Default)OpenAI | 1140.00 | — | 6,710 | OpenAI | Proprietary |
| 90 | Kimi K2 Turbo PreviewMoonshot AI | 1139.00 | — | 2,096 | Moonshot AI | Open Source |
| 91 | Gemini 2.5 Flash Lite Preview 09-2025Google | 1136.00 | — | 6,860 | Proprietary | |
| 92 | Gemini 3.1 Flash-Lite PreviewGoogle | 1128.00 | — | 20,906 | Proprietary | |
| 93 | Mistral Medium 3 (2505)Mistral | 1124.00 | — | 6,396 | Mistral | Proprietary |
| 94 | Ministral 3 14B (2512)Mistral | 1120.00 | — | 2,379 | Mistral | Open Source |
| 95 | Gemini 2.5 FlashGoogle | 1114.00 | — | 6,960 | Proprietary | |
| 96 | v0-1.5-mdVercel | 1112.00 | — | 11,086 | Vercel | Proprietary |
| 97 | 1108.00 | — | 26,957 | xAI | Proprietary | |
| 98 | Ministral 3 8B (2512)Mistral | 1108.00 | — | 2,427 | Mistral | Open Source |
| 99 | 1101.00 | — | 36,078 | xAI | Proprietary | |
| 100 | Qwen3-235B-A22B-2507阿里巴巴 | 1094.00 | — | 6,932 | 阿里巴巴 | Open Source |
| 101 | Kimi K2Moonshot AI (Legacy) | 1089.00 | — | 1,352 | Moonshot AI (Legacy) | Open Source |
| 102 | Magistral Medium 1.2 (2509)Mistral | 1089.00 | — | 5,851 | Mistral | Proprietary |
| 103 | Qwen3-235B-A22B-Thinking-2507Alibaba | 1088.00 | — | 6,169 | Alibaba | Open Source |
| 104 | GPT-4.1OpenAI | 1081.00 | — | 1,747 | OpenAI | Proprietary |
| 105 | OpenAI o3OpenAI | 1075.00 | — | 1,365 | OpenAI | Proprietary |
| 106 | 1072.00 | — | 24,117 | xAI | Proprietary | |
| 107 | Devstral MediumMistral | 1068.00 | — | 7,158 | Mistral | Proprietary |
| 108 | Ministral 3 3B (2512)Mistral | 1065.00 | — | 2,852 | Mistral | Open Source |
| 109 | Codestral 2508Mistral | 1062.00 | — | 6,746 | Mistral | Proprietary |
| 110 | Qwen3-235B-A22BAlibaba | 1057.00 | — | 5,154 | Alibaba | Open Source |
| 111 | 1054.00 | — | 4,296 | xAI | Proprietary | |
| 112 | GPT-4.1 miniOpenAI | 1049.00 | — | 1,566 | OpenAI | Proprietary |
| 113 | Magistral Small 1.2 (2509)Mistral | 1041.00 | — | 6,448 | Mistral | Open Source |
| 114 | o4-miniOpenAI | 1031.00 | — | 2,011 | OpenAI | Proprietary |
| 115 | Olmo 3.1 32B ThinkAllen AI | 1030.00 | — | 16,219 | Allen AI | Open Source |
| 116 | GPT-4.1 nanoOpenAI | 1018.00 | — | 1,901 | OpenAI | Proprietary |
| 117 | GPT OSS 120BOpenAI | 1018.00 | — | 5,268 | OpenAI | Open Source |
| 118 | Qwen3 30B-A3BAlibaba | 997.00 | — | 2,575 | Alibaba | Open Source |
| 119 | 985.00 | — | 7,626 | xAI | Proprietary | |
| 120 | Llama 3.1 Nemotron Ultra 253BNVIDIA | 984.00 | — | 3,172 | NVIDIA | Open Source |
| 121 | Mistral Small 3.2Mistral | 962.00 | — | 1,243 | Mistral | Open Source |
| 122 | Llama 4 MaverickFacebook AI研究实验室 | 935.00 | — | 1,678 | Facebook AI研究实验室 | Open Source |
| 123 | Mistral Large 2.1 (2411)Mistral | 918.00 | — | 1,317 | Mistral | Proprietary |
| 124 | GPT-4oOpenAI | 916.00 | — | 1,780 | OpenAI | Proprietary |
| 125 | Codestral 2 (2501)Mistral | 889.00 | — | 1,444 | Mistral | Open Source |
| 126 | Devstral Small 1.1Mistral | 862.00 | — | 1,250 | Mistral | Open Source |
| 127 | Llama 4 ScoutFacebook AI研究实验室 | 845.00 | — | 1,275 | Facebook AI研究实验室 | Open Source |
数据仅供参考,以官方来源为准。模型名称旁的链接可跳转到 DataLearner 模型详情页。
关于本榜单
本榜单数据来源于Design Arena,由 Y Combinator 支持的 Arcada Labs 开发,是专注于评测 AI 设计代码生成能力的众包匿名对战平台。
与 LMArena 评测通用文本和编程能力不同,Design Arena 的代码榜专门考察模型生成具有视觉呈现效果的前端代码的能力。平台将代码任务细分为 Website、UI 组件、游戏开发、数据可视化、SVG、Web App、移动端等多个子类别,每个子类别均有独立排行。
本页展示的是 Code Categories 综合榜,即将所有子类别的用户投票混合汇总后,统一用 Bradley-Terry 模型(类 Elo 算法)计算出的综合排名。每票等权,不对各子类别做加权处理,因此投票量较大的子类别(如 Website)对综合分数的影响更大。得分越高,代表模型在设计代码生成场景下的综合人类偏好越强。
常见问题 (FAQ)
什么是 Arcada Labs Code Categories Arena?
Arcada Labs Code Categories Arena 是专注于设计代码生成能力的匿名评测平台,覆盖 Website、UI 组件、游戏开发、数据可视化等多个代码生成子类别,并将投票汇总为综合榜单。
Arcada Code Arena 与 LMArena Coding Arena 有什么区别?
LMArena Coding Arena 主要评测通用编程能力,例如代码生成、调试和算法实现;Arcada Code Arena 专注于具有视觉呈现效果的前端设计代码,例如 HTML 页面、交互 UI、图表、SVG 和原型。
排名方法论是什么?
Arcada Labs 将各代码子类别的原始投票混合后运行 Bradley-Terry 模型。每票等权,不按子类别单独加权,因此投票量较大的子类别会对综合分数产生更大影响。
哪类模型在设计代码场景表现更好?
具备强视觉理解和前端代码生成能力的大模型通常表现更好。针对 UI 和代码生成优化的专项模型,在布局、交互和视觉细节任务上也可能有突出表现。









