AI Model Context Windows: 2026 Comparison

Context window size—the total number of tokens a model can process in a single request—has become a key differentiator among frontier AI models. This comparison provides an objective overview of token limits across major providers as of February 5, 2026.

Current Context Windows by Model

Model Provider Context Window (Input) Max Output Tokens
Gemini 3 Pro Google DeepMind 2,000,000 8,192
Grok 4.1 Fast xAI 2,000,000 ~16,000
Claude Opus 4.6 Anthropic 200,000 (1M beta) 128,000
Claude Sonnet 4.5 Anthropic 200,000 64,000
Gemini 2.5 Flash Google DeepMind 1,048,576 65,535
GPT-4.1 (API) OpenAI 1,000,000 ~32,000
Claude Sonnet 4.5 (Enterprise) Anthropic 500,000 64,000
GPT-5 OpenAI 400,000 128,000
Grok 4 xAI 256,000 ~16,000
Claude Sonnet 4.5 (Standard) Anthropic 200,000 64,000
GPT-4o OpenAI 128,000 4,096

Key Observations

Context Window Distribution

Current models fall into several distinct tiers based on input capacity:

  • 2M+ tokens: Gemini 3 Pro, Grok 4.1 Fast
  • 1M tokens: Gemini 2.5 Flash, GPT-4.1 (API), Claude Opus 4.6 (beta)
  • 200k-500k tokens: Claude Opus 4.6 (standard), Claude Sonnet 4.5, GPT-5
  • 128k-256k tokens: Grok 4, GPT-4o

Output Token Limits

While input capacity has grown significantly, output limits remain more constrained. Claude Opus 4.6 and GPT-5 both offer the highest output capacity at 128,000 tokens. Claude Sonnet 4.5 now supports up to 64,000 output tokens, while Gemini 2.5 Flash stands out with 65,535 output tokens. Most other models cap output between 4,096 and 16,000 tokens.

Access Tiers

Several providers offer different context windows based on subscription level:

  • OpenAI: Free tier offers 8,192 tokens; Plus/Team gets 32,000; Enterprise/API reaches 1,000,000
  • Anthropic: Standard plans provide 200,000 tokens; Enterprise increases to 500,000; 1M token context available in beta for high-tier organizations
  • Google: Free tier around 32,000 tokens; Gemini Advanced unlocks full capacity

Use Case Considerations

Token limits directly impact practical applications:

  • Document analysis: Models with 1M+ tokens can process entire books or large codebases in a single request
  • Conversational AI: Models with 128k-200k tokens handle extended multi-turn conversations effectively
  • Content generation: Higher output limits (GPT-5, Gemini 2.5 Flash) enable generation of longer-form content without pagination

Note: Token counts are approximate and represent maximum advertised limits. Actual usable context may vary based on API tier, model version, and specific implementation.