Artificial Analysis Intelligence Index AI模型智能指数排行榜
Artificial Analysis Intelligence Index v4.0 综合了10项权威评测基准(GDPval-AA、Terminal-Bench、GPQA Diamond、SciCode等),从数学、科学、编程、推理等多维度对AI模型进行全面评估和排名。
榜首模型
GPT-5.5 (xhigh)
最高得分
60
模型数量
203
数据版本
2026年04月25日
数据来源: Artificial Analysis
筛选条件
榜单历史快照月份:
排名总表
| 排名 | 模型名称 | 智能指数 | 机构 |
|---|---|---|---|
| GPT-5.5 (xhigh) | 60 | OpenAI | |
| GPT-5.5 (high) | 59 | OpenAI | |
| Opus 4.7 (max) | 57 | Anthropic | |
| 4 | Gemini 3.1 Pro Preview | 57 | Google Deep Mind |
| 5 | GPT-5.4 (xhigh) | 57 | OpenAI |
| 6 | GPT-5.5 (medium) | 57 | OpenAI |
| 7 | Kimi K2.6 | 54 | Moonshot AI |
| 8 | MiMo-V2.5-Pro | 54 | Xiaomi |
| 9 | GPT-5.3 Codex (xhigh) | 54 | OpenAI |
| 10 | Muse Spark | 52 | Facebook AI研究实验室 |
| 11 | Opus 4.7 (Non-reasoning, high) | 52 | Anthropic |
| 12 | Qwen3.6-Max-Preview | 52 | 阿里巴巴 |
| 13 | Claude Sonnet 4.6 (max) | 52 | Anthropic |
| 14 | DeepSeek-V4-Pro (Max) | 52 | DeepSeek-AI |
| 15 | GLM 5.1 | 51 | 智谱AI |
| 16 | GPT-5.5 (low) | 51 | OpenAI |
| 17 | Qwen 3.6 Plus Preview | 50 | 阿里巴巴 |
| 18 | DeepSeek-V4-Pro (High) | 50 | DeepSeek-AI |
| 19 | GLM-5 | 50 | 智谱AI |
| 20 | MiniMax-M2.7 | 50 | MiniMaxAI |
| 21 | Grok 4.20 0309 v2 | 49 | xAI |
| 22 | MiMo-V2-Pro | 49 | Xiaomi |
| 23 | GPT-5.4 mini (xhigh) | 49 | OpenAI |
| 24 | GLM-5-Turbo | 47 | 智谱AI |
| 25 | DeepSeek-V4-Flash (Max) | 47 | DeepSeek-AI |
| 26 | Gemini 3.0 Flash | 46 | Google Deep Mind |
| 27 | Qwen3.6-27B | 46 | 阿里巴巴 |
| 28 | Qwen3.5-397B-A17B | 45 | 阿里巴巴 |
| 29 | Nova 2 Omni(Preview) | 45 | 亚马逊 |
| 30 | DeepSeek-V4-Flash (High) | 45 | DeepSeek-AI |
| 31 | Claude Sonnet 4.6 (Non-reasoning) | 44 | Anthropic |
| 32 | GPT-5.4 nano (xhigh) | 44 | OpenAI |
| 33 | GLM 5.1 | 44 | 智谱AI |
| 34 | Qwen3.6-35B-A3B | 43 | 阿里巴巴 |
| 35 | MiMo-V2-Omni | 43 | Xiaomi |
| 36 | GLM-5V-Turbo | 43 | 智谱AI |
| 37 | Claude Sonnet 4.6 (Non-reasoning, Low Effort) | 43 | Anthropic |
| 38 | DeepSeek V3.2 | 42 | DeepSeek-AI |
| 39 | Qwen3.5-122B-A10B | 42 | 阿里巴巴 |
| 40 | Gemini 2.0 Flash Experimental | 41 | DeepMind |
| 41 | Gemini 3.1 Pro Preview (low) | 41 | Google Deep Mind |
| 42 | GPT-5.5 (Non-reasoning) | 41 | OpenAI |
| 43 | GLM-5 | 41 | 智谱AI |
| 44 | Qwen3.5-397B-A17B | 40 | 阿里巴巴 |
| 45 | Gemma 4 31B | 39 | DeepMind |
| 46 | Qwen3.5-Omni-Plus | 39 | 阿里巴巴 |
| 47 | Grok 4.1 Fast | 39 | xAI |
| 48 | Step 3.5 Flash | 38 | StepFunAI |
| 49 | OpenAI o3 | 38 | OpenAI |
| 50 | GPT-5.4 nano | 38 | OpenAI |
| 51 | GPT-5.4 mini (medium) | 38 | OpenAI |
| 52 | Kimi K2.5 | 37 | Moonshot AI |
| 53 | Qwen3.6-27B | 37 | 阿里巴巴 |
| 54 | Haiku 4.5 | 37 | Anthropic |
| 55 | NVIDIA Nemotron 3 Super | 36 | NVIDIA |
| 56 | Qwen3.5-122B-A10B | 36 | 阿里巴巴 |
| 57 | Nova 2 Pro(Preview) (medium) | 36 | 亚马逊 |
| 58 | GPT-5.4 (Non-reasoning) | 35 | OpenAI |
| 59 | Gemini 3.0 Flash | 35 | Google Deep Mind |
| 60 | Gemini 2.5-Pro | 35 | Google Deep Mind |
| 61 | Nova 2 Lite (high) | 35 | 亚马逊 |
| 62 | Ling-2.6-1T | 34 | InclusionAI |
| 63 | Gemini 3.1 Flash-Lite Preview | 34 | |
| 64 | Doubao Seed Code | 34 | ByteDance Seed |
| 65 | GPT OSS 120B (high) | 33 | OpenAI |
| 66 | Mercury 2 | 33 | Inception |
| 67 | Qwen3.5-9B-Instruct | 32 | 阿里巴巴 |
| 68 | Gemma 4 31B | 32 | DeepMind |
| 69 | K-EXAONE | 32 | LG AI Research |
| 70 | DeepSeek V3.2 | 32 | DeepSeek-AI |
| 71 | Grok-3 mini - Reasoning (high) | 32 | xAI |
| 72 | Nova 2 Pro(Preview) (low) | 32 | 亚马逊 |
| 73 | Trinity Large Thinking | 32 | Arcee AI |
| 74 | Qwen3.6-35B-A3B | 32 | 阿里巴巴 |
| 75 | Gemma 4 26B A4B | 31 | DeepMind |
| 76 | Haiku 4.5 | 31 | Anthropic |
| 77 | Qwen3.5-35B-A3B | 31 | 阿里巴巴 |
| 78 | MiMo-V2-Flash | 30 | Xiaomi |
| 79 | Nova 2 Lite (medium) | 30 | 亚马逊 |
| 80 | DeepSeek V3.2 Speciale | 29 | DeepSeek-AI |
| 81 | ERNIE 5.0 | 29 | 百度 |
| 82 | Grok 4.20 0309 v2 | 29 | xAI |
| 83 | Grok Code Fast 1 | 29 | xAI |
| 84 | Nemotron Cascade 2 30B A3B | 28 | NVIDIA |
| 85 | Qwen3-Coder-Next | 28 | 阿里巴巴 |
| 86 | Nova 2 Omni(Preview) (medium) | 28 | 亚马逊 |
| 87 | Mistral Small 4 | 28 | Mistral |
| 88 | Qwen3.5-9B-Instruct | 27 | 阿里巴巴 |
| 89 | Magistral Medium 1.2 | 27 | Mistral |
| 90 | Gemma 4 26B A4B | 27 | DeepMind |
| 91 | Qwen3.5 4B | 27 | Alibaba |
| 92 | DeepSeek-R1-0528 | 27 | DeepSeek-AI |
| 93 | Qwen3-Next | 27 | 阿里巴巴 |
| 94 | Ling 2.6 Flash | 26 | InclusionAI |
| 95 | Solar Pro 3 | 26 | Upstage |
| 96 | Qwen3.5-Omni-Flash | 26 | 阿里巴巴 |
| 97 | JT-MINI | 25 | China Mobile |
| 98 | Nova 2 Lite (low) | 25 | 亚马逊 |
| 99 | GPT OSS 20B (high) | 24 | OpenAI |
| 100 | GPT OSS 120B (low) | 24 | OpenAI |
| 101 | GPT-5.4 nano | 24 | OpenAI |
| 102 | NVIDIA Nemotron 3 Nano | 24 | NVIDIA |
| 103 | LongCat Flash Lite | 24 | LongCat |
| 104 | Grok 4.1 Fast | 24 | xAI |
| 105 | K-EXAONE | 23 | LG AI Research |
| 106 | GPT-5.4 mini | 23 | OpenAI |
| 107 | Nova 2 Omni(Preview) (low) | 23 | 亚马逊 |
| 108 | Nova 2 Pro(Preview) | 23 | 亚马逊 |
| 109 | Mi:dm K 2.5 Pro | 23 | Korea Telecom |
| 110 | Mistral Large 3 | 23 | MistralAI |
| 111 | Ring-1T | 23 | InclusionAI |
| 112 | Qwen3.5 4B | 23 | Alibaba |
| 113 | INTELLECT-3 | 22 | Prime Intellect |
| 114 | Devstral 2 | 22 | Mistral |
| 115 | Solar Open 100B | 22 | Upstage |
| 116 | Gemini 2.5 Flash-Lite-Preview-09-2025 | 22 | Google Deep Mind |
| 117 | Mistral Medium 3.1 | 21 | Mistral |
| 118 | GPT OSS 20B (low) | 21 | OpenAI |
| 119 | Qwen3-Next | 20 | 阿里巴巴 |
| 120 | Devstral Small 2 | 19 | Mistral |
| 121 | Gemini 2.5 Flash-Lite-Preview-09-2025 | 19 | Google Deep Mind |
| 122 | Motif-2-12.7B | 19 | Motif Technologies |
| 123 | Ling-1T | 19 | InclusionAI |
| 124 | Nova Premier | 19 | Amazon |
| 125 | Gemma 4 E4B | 19 | DeepMind |
| 126 | Llama Nemotron Super 49B v1.5 | 19 | Meta |
| 127 | Mistral Small 4 | 19 | Mistral |
| 128 | Llama 3.3 Nemotron Super 49B | 18 | Meta |
| 129 | Llama 4 Maverick | 18 | Facebook AI研究实验室 |
| 130 | Magistral Small 1.2 | 18 | Mistral |
| 131 | Sarvam 105B (high) | 18 | Sarvam |
| 132 | Nova 2 Lite | 18 | 亚马逊 |
| 133 | Llama3.1-405B | 17 | Facebook AI研究实验室 |
| 134 | EXAONE 4.0 32B | 17 | LG AI Research |
| 135 | Nova 2 Omni(Preview) | 17 | 亚马逊 |
| 136 | Qwen3.5 2B | 16 | Alibaba |
| 137 | Nanbeige4.1-3B | 16 | Nanbeige |
| 138 | Ministral 3 14B | 16 | MistralAI |
| 139 | DeepSeek-R1-Distill-Llama-70B | 16 | DeepSeek-AI |
| 140 | Falcon-H1R-7B | 16 | TII UAE |
| 141 | Ling-flash-2.0 | 16 | InclusionAI |
| 142 | Qwen3-Omni-30B-A3B | 16 | 阿里巴巴 |
| 143 | Step3 VL 10B | 15 | StepFun |
| 144 | Gemma 4 E2B | 15 | DeepMind |
| 145 | Llama Nemotron Ultra | 15 | NVIDIA |
| 146 | ERNIE-4.5-300B-A47B | 15 | 百度 |
| 147 | Solar Pro 2 | 15 | Upstage |
| 148 | NVIDIA Nemotron Nano 12B v2 VL | 15 | NVIDIA |
| 149 | Ministral 3 8B | 15 | MistralAI |
| 150 | Gemma 4 E4B | 15 | DeepMind |
| 151 | NVIDIA Nemotron Nano 9B V2 | 15 | NVIDIA |
| 152 | NVIDIA Nemotron 3 Nano 4B | 15 | NVIDIA |
| 153 | Qwen3.5 2B | 15 | Alibaba |
| 154 | Llama Nemotron Super 49B v1.5 | 15 | Meta |
| 155 | Llama3.3-70B-Instruct | 14 | Facebook AI研究实验室 |
| 156 | Llama 3.1 Nemotron Nano 4B v1.1 | 14 | Meta |
| 157 | Kimi Linear 48B A3B Instruct | 14 | Kimi |
| 158 | Llama 3.3 Nemotron Super 49B | 14 | Meta |
| 159 | Ring-flash-2.0 | 14 | InclusionAI |
| 160 | Solar Pro 2 | 14 | Upstage |
| 161 | Llama 4 Scout | 14 | Facebook AI研究实验室 |
| 162 | C4AI Command A (202503) | 13 | CohereAI |
| 163 | Llama 3.1 Nemotron 70B | 13 | NVIDIA |
| 164 | NVIDIA Nemotron 3 Nano | 13 | NVIDIA |
| 165 | NVIDIA Nemotron Nano 9B V2 | 13 | NVIDIA |
| 166 | Sarvam 30B (high) | 12 | Sarvam |
| 167 | Gemma 4 E2B | 12 | DeepMind |
| 168 | R1 1776 | 12 | Perplexity |
| 169 | Llama 3.2-Vision-90B | 12 | Facebook AI研究实验室 |
| 170 | EXAONE 4.0 32B | 12 | LG AI Research |
| 171 | Ministral 3 3B | 11 | Mistral |
| 172 | Jamba 1.7 Large | 11 | AI21 Labs |
| 173 | Granite 4.0 H Small | 11 | IBM |
| 174 | Qwen3-Omni-30B-A3B | 11 | 阿里巴巴 |
| 175 | Qwen3.5 0.8B | 11 | Alibaba |
| 176 | LFM2 24B A2B | 10 | Liquid AI |
| 177 | Phi 4 - 14B | 10 | Microsoft Azure |
| 178 | Amazon Nova Micro | 10 | 亚马逊 |
| 179 | NVIDIA Nemotron Nano 12B v2 VL | 10 | NVIDIA |
| 180 | Phi-4-multimodal-instruct | 10 | Microsoft Azure |
| 181 | Qwen3.5 0.8B | 10 | Alibaba |
| 182 | Jamba Reasoning 3B | 10 | AI21 Labs |
| 183 | Gemini 3.0 Flash | 10 | Google Deep Mind |
| 184 | Ling-mini-2.0 | 9 | InclusionAI |
| 185 | Llama 3.2-Vision-11B | 9 | Facebook AI研究实验室 |
| 186 | Phi-4-mini-instruct (3.8B) | 8 | Microsoft Azure |
| 187 | Exaone 4.0 1.2B | 8 | LG AI Research |
| 188 | Exaone 4.0 1.2B | 8 | LG AI Research |
| 189 | LFM2.5-1.2B-Thinking | 8 | Liquid AI |
| 190 | Jamba 1.7 Mini | 8 | AI21 Labs |
| 191 | LFM2.5-1.2B-Instruct | 8 | Liquid AI |
| 192 | LFM2 2.6B | 8 | Liquid AI |
| 193 | Granite 4.0 H 1B | 8 | IBM |
| 194 | Gemma 3-270M | 8 | Google Deep Mind |
| 195 | Apertus 70B Instruct | 8 | Swiss AI |
| 196 | Granite 4.0 Micro | 8 | IBM |
| 197 | Granite 4.0 1B | 7 | IBM |
| 198 | LFM2 8B A1B | 7 | Liquid AI |
| 199 | LFM2.5-VL-1.6B | 6 | Liquid AI |
| 200 | Granite 4.0 350M | 6 | IBM |
| 201 | Apertus 8B Instruct | 6 | Swiss AI |
| 202 | Granite 4.0 H 350M | 5 | IBM |
| 203 | Tiny Aya Global | 5 | Cohere |
数据仅供参考,以官方来源为准。模型名称旁的链接可跳转到 DataLearner 模型详情页。
评测基准组成(Intelligence Index v4.0)
Intelligence Index 综合10项严格的评测基准,全面衡量AI模型能力,避免单一维度的过拟合。
GDPval-AA
智能体真实任务
τ²-Bench
智能体工具调用
Terminal-Bench
智能体编程
SciCode
编程能力
AA-LCR
长上下文推理
AA-Omniscience
知识与幻觉检测
IFBench
指令遵循
Humanity's Last Exam
推理与知识
GPQA Diamond
科学推理
CritPt
物理推理
常见问题 (FAQ)
什么是 Artificial Analysis Intelligence Index?▼
Artificial Analysis Intelligence Index v4.0 是一个综合评测指数,聚合了10项具有挑战性的评估——涵盖数学、科学、编程、智能体任务和推理——以全面衡量AI能力。它旨在防止单一维度的过拟合,提供一个统一分数来追踪模型进步。
智能指数是如何计算的?▼
该指数综合了10项评测的分数:GDPval-AA(智能体真实任务)、τ²-Bench(工具调用)、Terminal-Bench Hard(智能体编程)、SciCode(编程)、AA-LCR(长上下文推理)、AA-Omniscience(知识与幻觉检测)、IFBench(指令遵循)、Humanity's Last Exam(推理)、GPQA Diamond(科学推理)和 CritPt(物理推理)。所有测试由 Artificial Analysis 在标准化硬件上独立运行。
这与 LMArena 排行榜有什么区别?▼
LMArena 排名基于众包用户投票(盲测A/B对比的Elo评分),反映主观的人类偏好。而 Artificial Analysis Intelligence Index 使用标准化的自动评测基准进行客观评分,衡量特定领域的技术能力。两者各有价值——LMArena 捕捉真实用户体验,而 AA Intelligence Index 提供可复现的技术测量。
在哪里可以找到原始数据?▼
原始排行榜和详细方法论可在 artificialanalysis.ai 查看。Intelligence Index 的方法论详见 Intelligence Index 页面。