Skip to content

My Notes on State of AI Report from Artificial Analysis

State of AI Report Q3 - Artificial Analysis

  • Big Tech companies have been spending a bunch of money in data centers
  • Most of NVDIA's revenus comes from data centers (41 out of 45$ billions)
  • Compute demans only increases
  • the argument that as we engineer better systems and become more efficient we will use less energy etc...goes out the window...as models become better and more integrated, the demand for more just increases (with things like agents working in parallel...)
  • From smaller models that use like 1/10th of compute to Agents that are like a 20x multiplier of compute (in requests/use)
  • Top spots in the artificial analysis intelligence index belong to:
    • GPT-5 high
    • Grok-4
    • Claude 4.5 Sonnet (+reasoning)
    • Gemini 2.5 Pro
    • DeepSeek V3.1 (+reasoning)
    • Llama 4 Maverick
  • The demand for model families is being lead by OpenAI, Google Gemini & Anthropic
  • Inference pricing by intelligence class continues to drop
  • Open Source models near the intelligence frontier:
  • GPT-OSS-120B
  • DeepSeek V3.1 Terminus
  • Qwen3 235B
  • GLM4.6
  • DeepSeek R1 0528
  • Apriel-v1.5
  • GPT-OSS-20B
  • Models are being trained and optimized with reinforcement learning for tool use and agentic task execution
  • Players in the agent space:
  • Coding:
    • Claude Code, GitHub Copilot, OpenAI Codex, replit, Cursor, Amp, Kiro
  • Deep Research:
    • Gemini, Perplexity, Grok, Claude, Mistral AI, Qwen3
  • Computer use
    • Gemini, Grok, comet, SURFER, Browser, use, Cursor Neon
  • Chat Applications are expanding to integrate tool availablity and enable multi-step workflows (agent builder, n8n workflows, the chatgpt apps...)
  • Video models saw rapid quality gains and leaderboard churn, with proprietary systems pulling ahead while open-weight video models lag
  • Instruction-based image editing gained popularity and models are becoming more generalized (open-weights remain competitive)
  • Chinese and US labs remain roughly even on image generation (Seedream 4.0 leads text-to-image, Gemini 2.5 Flash leads editing) while China leads in video generation, and text-to-image improvements were incremental as open-weight text-to-image progress slowed.
  • Smaller companies use media generation to compete with larger companies
  • Google's Chirp 2 (didn't know this model!) leads in transcription accuracy