What is the Artificial Analysis Intelligence Index?

The Artificial Analysis Intelligence Index v4.0 is a composite benchmark that aggregates performance across 10 evaluations spanning mathematics, science, coding, agentic tasks, and reasoning to measure AI capabilities holistically.

How is the Intelligence Index calculated?

The index aggregates scores from 10 benchmarks: GDPval-AA, τ²-Bench, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, and CritPt. All tests are independently run on standardized hardware.

How does this differ from LMArena?

LMArena uses crowdsourced user votes (Elo ratings) reflecting subjective preferences. The AA Intelligence Index uses standardized automated benchmarks with objective scoring across specific technical domains.

Where can I find the original data?

The original leaderboard is available at artificialanalysis.ai/leaderboards/models and the methodology at artificialanalysis.ai/evaluations/artificial-analysis-intelligence-index.

Artificial Analysis Intelligence Index AI模型智能指数排行榜

Name: Artificial Analysis Intelligence Index AI模型智能指数排行榜
Creator: DataLearner
License: https://creativecommons.org/licenses/by/4.0/

Artificial Analysis Intelligence Index v4.0 综合了10项权威评测基准（GDPval-AA、Terminal-Bench、GPQA Diamond、SciCode等），从数学、科学、编程、推理等多维度对AI模型进行全面评估和排名。

榜首模型

Qwen3.7 Max

最高得分

模型数量

201

数据版本

2026年05月31日

数据来源: Artificial Analysis

来源：全部国产模型

榜单历史快照月份:

排名总表

排名	模型名称	智能指数	机构
7	Qwen3.7 MaxAlibaba	57	Alibaba
10	Kimi K2.6Moonshot AI	54	Moonshot AI
17	DeepSeek-V4-Pro (max)DeepSeek-AI	52	DeepSeek-AI
21	DeepSeek-V4-Pro (high)DeepSeek-AI	50	DeepSeek-AI
22	MiniMax-M2.7MiniMaxAI	50	MiniMaxAI
27	DeepSeek-V4-Flash (max)DeepSeek-AI	47	DeepSeek-AI
28	DeepSeek-V4-Flash (high)DeepSeek-AI	46	DeepSeek-AI
39	Kimi K2.6Moonshot AI	43	Moonshot AI
42	Hy3-previewTencent	42	Tencent
48	DeepSeek-V4-ProDeepSeek-AI	39	DeepSeek-AI
52	Step 3.5 FlashStepFunAI	38	StepFunAI
60	DeepSeek-V4-FlashDeepSeek-AI	36	DeepSeek-AI
68	Hy3-previewTencent	34	Tencent
70	Doubao Seed CodeByteDance Seed	34	ByteDance Seed
95	Qwen3.5 4BAlibaba	27	Alibaba
113	Qwen3.5 4BAlibaba	23	Alibaba
134	Qwen3.5 2BAlibaba	16	Alibaba
139	Step3 VL 10BStepFun	15	StepFun
150	Qwen3.5 2BAlibaba	15	Alibaba
153	Kimi Linear 48B A3B InstructKimi	14	Kimi
172	Qwen3.5 0.8BAlibaba	11	Alibaba
178	Qwen3.5 0.8BAlibaba	10	Alibaba

数据仅供参考，以官方来源为准。模型名称旁的链接可跳转到 DataLearner 模型详情页。

评测基准组成（Intelligence Index v4.0）

Intelligence Index 综合10项严格的评测基准，全面衡量AI模型能力，避免单一维度的过拟合。

GDPval-AA

智能体真实任务

τ²-Bench

智能体工具调用

Terminal-Bench

智能体编程

SciCode

编程能力

AA-LCR

长上下文推理

AA-Omniscience

知识与幻觉检测

IFBench

指令遵循

Humanity's Last Exam

推理与知识

GPQA Diamond

科学推理

CritPt

物理推理

常见问题 (FAQ)

什么是 Artificial Analysis Intelligence Index？▼

Artificial Analysis Intelligence Index v4.0 是一个综合评测指数，聚合了10项具有挑战性的评估——涵盖数学、科学、编程、智能体任务和推理——以全面衡量AI能力。它旨在防止单一维度的过拟合，提供一个统一分数来追踪模型进步。

智能指数是如何计算的？▼

该指数综合了10项评测的分数：GDPval-AA（智能体真实任务）、τ²-Bench（工具调用）、Terminal-Bench Hard（智能体编程）、SciCode（编程）、AA-LCR（长上下文推理）、AA-Omniscience（知识与幻觉检测）、IFBench（指令遵循）、Humanity's Last Exam（推理）、GPQA Diamond（科学推理）和 CritPt（物理推理）。所有测试由 Artificial Analysis 在标准化硬件上独立运行。

这与 LMArena 排行榜有什么区别？▼

LMArena 排名基于众包用户投票（盲测A/B对比的Elo评分），反映主观的人类偏好。而 Artificial Analysis Intelligence Index 使用标准化的自动评测基准进行客观评分，衡量特定领域的技术能力。两者各有价值——LMArena 捕捉真实用户体验，而 AA Intelligence Index 提供可复现的技术测量。

在哪里可以找到原始数据？▼

原始排行榜和详细方法论可在 artificialanalysis.ai 查看。Intelligence Index 的方法论详见 Intelligence Index 页面。

Artificial Analysis Intelligence Index AI模型智能指数排行榜

榜首模型

Qwen3.7 Max

最高得分

模型数量

201

数据版本

2026年05月31日

排名

模型名称

智能指数

机构

Qwen3.7 MaxAlibaba

Alibaba

Kimi K2.6Moonshot AI

Moonshot AI

DeepSeek-V4-Pro (max)DeepSeek-AI

DeepSeek-AI

DeepSeek-V4-Pro (high)DeepSeek-AI

DeepSeek-AI

MiniMax-M2.7MiniMaxAI

MiniMaxAI

DeepSeek-V4-Flash (max)DeepSeek-AI

DeepSeek-AI

DeepSeek-V4-Flash (high)DeepSeek-AI

DeepSeek-AI

Kimi K2.6Moonshot AI

Moonshot AI

Hy3-previewTencent

Tencent

DeepSeek-V4-ProDeepSeek-AI

DeepSeek-AI

Step 3.5 FlashStepFunAI

StepFunAI

DeepSeek-V4-FlashDeepSeek-AI

DeepSeek-AI

Hy3-previewTencent

Tencent

Doubao Seed CodeByteDance Seed

ByteDance Seed

Qwen3.5 4BAlibaba

Alibaba

113

Qwen3.5 4BAlibaba

Alibaba

134

Qwen3.5 2BAlibaba

Alibaba

139

Step3 VL 10BStepFun

StepFun

150

Qwen3.5 2BAlibaba

Alibaba

153

Kimi Linear 48B A3B InstructKimi

Kimi

172

Qwen3.5 0.8BAlibaba

Alibaba

178

Qwen3.5 0.8BAlibaba

Alibaba

评测基准组成（Intelligence Index v4.0）

Intelligence Index 综合10项严格的评测基准，全面衡量AI模型能力，避免单一维度的过拟合。

GDPval-AA

智能体真实任务

τ²-Bench

智能体工具调用

Terminal-Bench

智能体编程

SciCode

编程能力

AA-LCR

长上下文推理

AA-Omniscience

知识与幻觉检测

IFBench

指令遵循

Humanity's Last Exam

推理与知识

GPQA Diamond

科学推理

CritPt

物理推理

常见问题 (FAQ)

什么是 Artificial Analysis Intelligence Index？▼

智能指数是如何计算的？▼

这与 LMArena 排行榜有什么区别？▼

在哪里可以找到原始数据？▼

原始排行榜和详细方法论可在 artificialanalysis.ai 查看。Intelligence Index 的方法论详见 Intelligence Index 页面。