Platform Comparison / Model Selection Reference · Updated June 2026

Which AI?

Models are listed cheapest-first within each brand. 
Collapse any AI group that you don't use.     

Claude (Anthropic)
Haiku 4.5 Sonnet 4.6 Opus 4.8 Fable 5
ChatGPT (OpenAI)
Instant Thinking Pro
Gemini (Google)
3.1 Flash-Lite 3.5 Flash 3.1 Pro
Copilot (Microsoft)
Smart Think Deeper Deep Research
Grok (xAI)
Grok Build Grok 4.3 4.20 Multi-Agent
Badge border: Cheapest model Mid model Expensive model Flagship (premium) Cell border: Brand pick Overall best   Overall best = green cell background + BEST label
Toggle columns:

No tasks match ""

Task Best
Overall
Claude (Anthropic)
ChatGPT (OpenAI)
Gemini (Google)
Copilot (Microsoft)
Grok (xAI)
Haiku 4.5
Cheapest
Sonnet 4.6
Balanced
Opus 4.8
Heavy
Fable 5
Flagship
Instant
Default
Thinking
Reasoning
Pro
Heaviest
3.1 Flash-Lite
Cheapest
3.5 Flash
Balanced
3.1 Pro
Heaviest
Smart
Default
Think Deeper
Reasoning
Deep Research
Heaviest
Grok Build
Cheapest
Grok 4.3
Balanced
4.20 Multi-Agent
Heaviest
Quick code checkDebug / syntax / review ClaudeSonnet 4.6Catches real bugs without flagship pricing Skip

Misses the subtler logic bugs

Sonnet ★

Finds logic errors and explains the fix in plain language

Skip

More than a spot check needs

Skip

Never for quick checks

Instant ★

Fast, free, and fine for syntax and small logic checks

Skip

Save it for harder problems

Skip

Not for spot checks

Skip

3.5 Flash reviews code better

3.5 Flash ★

Good review quality for the price

Skip

Overkill

Smart ★

Decent quick reviews inside the tools you already use

Skip

Overkill for spot checks

Skip

It's a research agent, not a reviewer

Build ★

A coding model that costs very little

Fallback

Works, but Build is cheaper for this

Skip

Overkill for quick checks

Full app buildMulti-file, architecture ClaudeOpus 4.8Tops the coding benchmarks; keeps big projects coherent Skip

Loses track across files

Fallback

Good for scoped modules

Opus ★

88.6% on SWE-bench Verified; the most reliable builder here

Upgrade if

You run long unattended builds and the 2x price is fine

Skip

Too light for this

Thinking ★

GPT-5.5 is a close second, and the best at terminal work

Upgrade if

The architecture is genuinely hard

Skip

Wrong tier

Fallback

Capable, but expect some re-prompting on deep logic

3.1 Pro ★

Handles big codebases with its 1M-token window

Fallback

Scoped builds only

Think Deeper ★

GPT-5.5 Thinking with your repo and docs in context

Skip

Research agent, not a builder

Fallback

Fine for small, scoped builds

Grok 4.3 ★

Grok's best option, though it trails the leaders on big builds

Skip

More agents don't close the coding gap

Full web buildHTML / CSS / JS / layout ClaudeOpus 4.8Layout, CSS, and accessibility handled in one pass Skip

Not reliable for responsive layouts

Fallback

Good for single components

Opus ★

The strongest design instincts of any model here

Skip

Opus already does this well

Skip

Too light

Thinking ★

Ties front end and back end together well

Skip

Rarely needed for web work

Skip

Too light

3.5 Flash ★

Fast and surprisingly good at front-end code

Upgrade if

The app is large or visually complex

Fallback

Component-level work only

Think Deeper ★

Solid full-stack reasoning

Skip

Wrong tool

Fallback

Scoped components only

Grok 4.3 ★

Capable, less polished on CSS details

Skip

Overkill

Quick social postsShort copy, captions Gemini3.1 Flash-LiteCosts almost nothing; short copy doesn't need more Haiku ★

Fast and easy on your quota

Skip

More than you need

Skip

Wasteful

Skip

Never for captions

Instant ★

The free default handles short posts fine

Skip

Overkill

Skip

Not for captions

Flash-Lite ★

The cheapest option on this page, and captions don't need more

Skip

Spends more than the job is worth

Skip

Never for captions

Smart ★

Quick drafts in the apps you already use

Skip

Overkill for captions

Skip

Wrong tool

Skip

It's a coding model

Grok 4.3 ★

Cheap, and it knows what's trending on X right now

Skip

Never for captions

Long-form blog1,000 to 3,000+ words ClaudeSonnet 4.6Holds your voice across thousands of words Skip

Voice drifts over long pieces

Sonnet ★

Matches your voice and keeps the thread from intro to close

Upgrade if

The post needs expert-level synthesis

Skip

Save the money for harder work

Skip

Not reliable past a few hundred words

Thinking ★

GPT-5.5 writes noticeably better than earlier versions

Skip

Not worth it for blogging

Skip

Loses coherence

3.5 Flash ★

Decent drafts that tend to run long; trim after

Upgrade if

You want research woven in as it writes

Smart ★

Fine if you draft in Word anyway

Upgrade if

The piece is research-dense

Skip

Wrong tool

Skip

It's a coding model

Grok 4.3 ★

Solid writing, and current on live topics

Skip

Overkill for blogging

Article researchSource synthesis, fact-finding Gemini3.5 FlashGoogle Search built in, at low cost Skip

Too shallow for source work

Sonnet ★

Good with web search on, and honest about what it can't verify

Upgrade if

Accuracy is high-stakes

Upgrade if

It's a long, multi-source dig worth the premium

Skip

Too light for research

Thinking ★

Deep Research mode does the legwork for you

Upgrade if

The research feeds a big decision

Skip

Not enough depth

3.5 Flash ★

Searches Google as it works and costs very little

Upgrade if

You need academic-grade cross-referencing

Fallback

Quick lookups only

Skip

Deep Research is the better Copilot tool here

Deep Research ★

A dedicated research agent; strongest when your sources live in M365

Skip

Wrong tool

Grok 4.3 ★

Live X and web data; good for fast-moving stories

Upgrade if

It's a large multi-source job

Prompt engineeringBuild prompts for AI tools ClaudeOpus 4.8Reasons about model behavior better than anything else Skip

Lacks the meta-reasoning

Fallback

Fine for simpler prompts

Opus ★

Catches the failure modes you'd otherwise hit in production

Upgrade if

The prompt drives an autonomous agent

Skip

Not suited

Thinking ★

Precise instruction building

Upgrade if

The prompt runs inside agentic workflows

Skip

Wrong tier

Fallback

Adequate for basic prompts

3.1 Pro ★

Works through prompt logic step by step

Fallback

Basic prompt drafting

Think Deeper ★

Breaks down prompt structure and failure paths

Skip

Wrong tool

Skip

Not suited

Grok 4.3 ★

Decent, less sharp on edge cases

Fallback

Several agents on one prompt, rarely worth it

Email draftingClient, internal, outreach ClaudeHaiku 4.5Clean, professional, and cheap Haiku ★

Clean professional drafts at the lowest Claude price

Upgrade if

The email is tone-sensitive or high-stakes

Skip

No reason to

Skip

Definitely not

Instant ★

Reliable tone control on the free default

Upgrade if

It's a tricky negotiation

Skip

Never needed

Flash-Lite ★

Cheap and fine for routine mail

Upgrade if

You're drafting in bulk with nuance

Skip

Never for email

Smart ★

Drafts right inside Outlook

Upgrade if

The email needs context from your files and meetings

Skip

Wrong tool

Skip

It's a coding model

Grok 4.3 ★

Capable, nothing special for email

Skip

Never for email

Data analysisCSV, metrics, pattern-finding Gemini3.1 ProReads huge datasets and runs its own analysis code Skip

Not enough for real inference

Fallback

Surface-level patterns only

Opus ★

Strong on messy, ambiguous data

Upgrade if

You want it to build and check a whole analysis on its own

Skip

Not suited

Thinking ★

Handles big CSVs and multi-step interpretation

Upgrade if

The analysis is genuinely hard

Skip

Not suited

Fallback

Quick structured summaries

3.1 Pro ★

Top reasoning scores, a huge context window, and it charts the results

Fallback

OK when the data lives in Excel

Think Deeper ★

Excel agent mode with Python, and it can run Claude under the hood

Skip

Wrong tool

Skip

Not suited

Fallback

Capable on clean data

Multi-Agent ★

Several agents on one dataset, but rivals reason better

Document summaryPDFs, contracts, reports Gemini3.5 FlashReads up to a million tokens and summarizes fast Skip

Misses nuance in dense documents

Sonnet ★

Reliable extraction with a 1M-token window

Upgrade if

It's a legal document needing judgment calls

Skip

Summaries don't need a flagship

Skip

Not suited

Thinking ★

Good at pulling what matters from long files

Upgrade if

Legal interpretation required

Fallback

Simple documents, nearly free

3.5 Flash ★

A million tokens of context at a mid-tier price

Upgrade if

You're synthesizing several documents at once

Smart ★

Summarizes your own Word and SharePoint files where they live

Upgrade if

The summary needs interpretation, not just extraction

Skip

Wrong tool for single documents

Skip

Wrong tool

Grok 4.3 ★

1M-token window, handles long reports

Upgrade if

Huge document sets; it reads 2M tokens

BrainstormingIdeas, concepts, angles ChatGPTInstantFast, varied, and free Fallback

Ideas start repeating sooner

Sonnet ★

More variety, fewer clichés

Skip

Depth isn't what brainstorming needs

Skip

Definitely not

Instant ★

Rapid-fire ideas on the free default

Upgrade if

You want strategy baked into the ideas

Skip

Overthinks it

Flash-Lite ★

Quick concept lists at almost no cost

Upgrade if

The brainstorm needs research behind it

Skip

Overkill

Smart ★

Handy when ideas should draw on your work files

Skip

Overthinks it

Skip

Wrong tool

Skip

It's a coding model

Grok 4.3 ★

Live X data helps with trend-driven ideas

Skip

Never for ideation

SEO and metadataTitle tags, meta, alt text Gemini3.1 Flash-LiteThe cheapest way to do metadata at scale Haiku ★

Good with character limits

Skip

Unnecessary

Skip

Never for metadata

Skip

Never for metadata

Instant ★

Handles bulk metadata fine

Skip

Not needed

Skip

Wasteful

Flash-Lite ★

Costs the least and follows structure rules reliably

Skip

Overkill

Skip

Never for metadata

Smart ★

Fast and fine for drafts

Skip

Not needed

Skip

Wrong tool

Fallback

Cheap, and structured output suits it

Grok 4.3 ★

Low cost, handles structured output well

Skip

Never for metadata

Strategic planningBusiness decisions, proposals ClaudeOpus 4.8The best read on risks, tradeoffs, and edge cases Skip

Can't weigh competing variables

Fallback

Lower-stakes planning

Opus ★

Surfaces the risks and edge cases others gloss over

Upgrade if

The decision is big enough to justify the premium

Skip

Not suited

Thinking ★

Strong tradeoff analysis

Upgrade if

It's the hardest kind of strategy problem

Skip

Not suited

Fallback

Tactical planning only

3.1 Pro ★

Structured decision analysis with plenty of context room

Fallback

When the plan draws on your org's data

Think Deeper ★

Deep reasoning with your company context loaded

Upgrade if

The plan needs market research first

Skip

Not suited

Fallback

Capable, less thorough on edge cases

Multi-Agent ★

Several agents argue it out, but the leaders reason better solo

Automation workflowsAPIs, tools, logic chains ClaudeOpus 4.8Runs long multi-step jobs without losing the plot Skip

Not for conditional logic at scale

Fallback

Simple automations

Opus ★

The agentic workhorse; built for API chains and tool use

Upgrade if

Agents run for hours or days unattended

Skip

Not suited

Thinking ★

Conditional flows, API chains, error handling

Upgrade if

Failure is expensive

Skip

Not suited

3.5 Flash ★

Quick and agentic, with code execution built in

Upgrade if

The logic is deeply nested

Fallback

Power Automate basics

Think Deeper ★

Best for M365 agents; the new Cowork agent runs multi-step jobs

Skip

Wrong tool

Fallback

Good for API glue code

Fallback

Capable, less proven for agents

Multi-Agent ★

Actually built as a team of agents

Claude: Haiku 4.5 $1/$5 · Sonnet 4.6 $3/$15 · Opus 4.8 $5/$25 · Fable 5 $10/$50 per M tokens in/out. Fable 5 launched June 9, 2026 and sits above Opus for the hardest reasoning and longest agent runs. Claude.ai Pro is $20/mo with 5-hour rolling sessions. ChatGPT: Free tier runs GPT-5.5 Instant. Plans: Go $8 · Plus $20 · Pro $100 or $200/mo. The model picker was redesigned in June 2026; Thinking and Pro tiers map to GPT-5.5 Thinking and GPT-5.5 Pro. API: GPT-5.5 $5/$30 per M. Gemini: 3.1 Flash-Lite $0.25/$1.50 · 3.5 Flash $1.50/$9 · 3.1 Pro $2/$12 per M (Pro rates double past 200k context). All three read 1M tokens. Gemini 3.5 Pro (2M context, Deep Think) is expected late June 2026. Plans: AI Plus $7.99 · Pro $19.99 · Ultra from $99.99/mo. Copilot: Free tier with usage caps. M365 Premium $19.99/mo. Copilot Business $18/user/mo promo ($21 list from Jul 2026). M365 Copilot $30/user/mo. Smart = GPT-5.5 auto-routing; Think Deeper = GPT-5.5 Thinking. M365 Copilot can also run Claude Opus 4.8. Grok: Build $1/$2 · Grok 4.3 $1.25/$2.50 · 4.20 Multi-Agent $1.25/$2.50 per M. SuperGrok $30/mo; SuperGrok Heavy $300/mo. Grok 4 and 4.1 Fast were retired May 15, 2026; Grok 5 is still in training. ★ = brand pick  |  Fallback = acceptable if pick unavailable  |  White border = overall best for that task