My Notes on State of AI Report from Artificial Analysis¶

State of AI Report Q3 - Artificial Analysis ¶

Big Tech companies have been spending a bunch of money in data centers
Most of NVDIA's revenus comes from data centers (41 out of 45$ billions)
Compute demans only increases
the argument that as we engineer better systems and become more efficient we will use less energy etc...goes out the window...as models become better and more integrated, the demand for more just increases (with things like agents working in parallel...)
From smaller models that use like 1/10^th of compute to Agents that are like a 20x multiplier of compute (in requests/use)
Top spots in the artificial analysis intelligence index belong to:
- GPT-5 high
- Grok-4
- Claude 4.5 Sonnet (+reasoning)
- Gemini 2.5 Pro
- DeepSeek V3.1 (+reasoning)
- Llama 4 Maverick
The demand for model families is being lead by OpenAI, Google Gemini & Anthropic
Inference pricing by intelligence class continues to drop
Open Source models near the intelligence frontier:
GPT-OSS-120B
DeepSeek V3.1 Terminus
Qwen3 235B
GLM4.6
DeepSeek R1 0528
Apriel-v1.5
GPT-OSS-20B
Models are being trained and optimized with reinforcement learning for tool use and agentic task execution
Players in the agent space:
Coding:
- Claude Code, GitHub Copilot, OpenAI Codex, replit, Cursor, Amp, Kiro
Deep Research:
- Gemini, Perplexity, Grok, Claude, Mistral AI, Qwen3
Computer use
- Gemini, Grok, comet, SURFER, Browser, use, Cursor Neon
Chat Applications are expanding to integrate tool availablity and enable multi-step workflows (agent builder, n8n workflows, the chatgpt apps...)
Video models saw rapid quality gains and leaderboard churn, with proprietary systems pulling ahead while open-weight video models lag
Instruction-based image editing gained popularity and models are becoming more generalized (open-weights remain competitive)
Chinese and US labs remain roughly even on image generation (Seedream 4.0 leads text-to-image, Gemini 2.5 Flash leads editing) while China leads in video generation, and text-to-image improvements were incremental as open-weight text-to-image progress slowed.
Smaller companies use media generation to compete with larger companies
Google's Chirp 2 (didn't know this model!) leads in transcription accuracy

My Notes on State of AI Report from Artificial Analysis¶

State of AI Report Q3 - Artificial Analysis¶

State of AI Report Q3 - Artificial Analysis ¶