Complete Pricing Index for AI Agents

AI Agent Pricing Guide
AI Agent Pricing Guide

Choosing the right AI agent platform isn’t just about model quality it’s about understanding the real cost of running it at scale. This complete pricing index breaks down up-to-date rates for OpenAI, Anthropic, Amazon Bedrock AgentCore, and open-source stacks, including total cost of ownership (TCO) and ROI scenarios. Whether you’re deploying a high-reasoning GPT-5 agent or optimizing with a lean open-source model, this guide gives you the clarity you need to budget smart and scale with confidence.

What drives agent cost

  1. Model tokens (input and output)
  2. Orchestration and tools (web search, code interpreter, browser automation)
  3. Memory and retrieval (storage and calls)
  4. Runtime compute for agent frameworks (serverless CPU and RAM billing)
  5. Observability and other add-ons

OpenAI pricing snapshot

Core model token prices and built-in tool fees that actually move the needle.

GPT-5 family

Model Input per 1M tokens Output per 1M tokens Notes
GPT-5 $1.25 $10.00 Cached input $0.125 per 1M
GPT-5 mini $0.25 $2.00 Cached input $0.025 per 1M
GPT-5 nano $0.05 $0.40 Cached input $0.005 per 1M

Built-in tools and extras

Tool or feature Price Notes
Web Search tool $10.00 per 1K calls Search content tokens billed at model rate
File Search tool $2.50 per 1K calls Storage $0.10 per GB per day (first GB free)
Code Interpreter $0.03 per run Sandboxed execution

Anthropic pricing snapshot

Latest Claude models with standard token billing. No separate fee for web search or computer use all charged via tokens.

Model Input per 1M tokens Output per 1M tokens Notes
Claude Opus 4.1 $15.00 $75.00 Frontier tier
Claude Opus 4 $15.00 $75.00
Claude Sonnet 4 $3.00 $15.00 Best price-performance
Claude Sonnet 3.7 $3.00 $15.00 Extended thinking

Tooling notes

  • Web search billed via token usage
  • Computer use billed via token usage
  • Sonnet 4 supports 1M context (beta)

Amazon Bedrock AgentCore pricing

New AWS agent runtime and tool suite. Prices are modular and consumption-based.

AgentCore runtime and tools

Component Unit price What it means
Runtime CPU $0.0895/vCPU-hour Active processing seconds only
Runtime Memory $0.00945/GB-hour 128MB minimum billing
Browser tool CPU $0.0895/vCPU-hour Headless browsing
Browser tool Memory $0.00945/GB-hour
Code Interpreter CPU $0.0895/vCPU-hour JS, TS, Python
Code Interpreter Memory $0.00945/GB-hour
Gateway InvokeTool $0.005/1K calls MCP tool invocations
Gateway Search API $0.025/1K calls Tool discovery
Tool indexing $0.02/100 tools/mo Searchable catalogs
Identity $0.010/1K requests Free via Runtime/Gateway
Memory short-term $0.25/1K events Session context
Memory long-term store $0.75/1K/mo Built-in strategy
Memory retrieval $0.50/1K retrievals

Related Bedrock services

  • Prompt Optimization: $0.030 per 1K tokens
  • Intelligent Prompt Routing: $1.00 per 1K requests

Note: Bedrock LLM inference rates vary by model and region.


Open-source stack pricing

If you deploy agents on open-weight models, cost = inference provider + orchestration layer.

Fireworks AI serverless

Model class Price per 1M tokens combined
>16B params (Llama 70B) $0.90
Meta Llama 3.1 405B $3.00
Split pricing models (Llama 4) $0.22 input / $0.88 output

Groq

Model Input per 1M Output per 1M
Llama 3.1 8B Instant 128K $0.05 $0.08

AI agent platforms pricing comparison

Platform Model example Token rates Extra fees
OpenAI GPT-5 $1.25 in / $10 out Web Search, File Search, Code Interpreter
Anthropic Claude Sonnet 4 $3 in / $15 out No extra per-call fees
Amazon Bedrock AgentCore runtime CPU $0.0895, RAM $0.00945 Gateway, Memory, plus model cost
Open-source (Fireworks) Llama 70B $0.90 combined Optional caching
Open-source (Groq) Llama 3.1 8B $0.05 in / $0.08 out None

TCO calculator example

Agent: 50,000 sessions/month, 3K input + 1K output tokens per session, 10% web search, 5% code interpreter use.

OpenAI GPT-5

  • Tokens: $687.50
  • Web search: $50.00
  • Code interpreter: $75.00
  • Total: $812.50

Anthropic Sonnet 4

  • Tokens: $1,200.00
  • Total: $1,200.00

Amazon Bedrock AgentCore overhead

  • Runtime: $14.40
  • Gateway: $1.15
  • Memory: $21.25
  • Overhead total: $36.80 (plus model cost)

Open-source model + AgentCore

  • Groq Llama 8B: Tokens $11.50 → ~$48.30 with overhead
  • Fireworks Llama 70B: Tokens $180.00 → ~$216.80 with overhead

ROI case

Example: SaaS inbound support, 3,000 tickets/month, 7 min each, $25/hr loaded rate, target 60% deflection.

  • Current human cost: $8,750/month
  • AI agent cost: $50–$1,200/month depending on stack
  • Breakeven deflection with GPT-5 cost: ~9.3%

Guidance by scenario

  • Heavy tool use → Bedrock AgentCore often more predictable than per-call fees
  • Long context → Sonnet 4 with 1M context reduces RAG calls
  • Enterprise controls → AgentCore’s Identity, Memory, Gateway add policy & audit
  • SaaS assistant vs bespoke → Compare Amazon Q licensing ($20/user/mo) vs custom


Bottom line

If your goal is lowest TCO with acceptable quality, an open-source 8B–70B model with Bedrock AgentCore is extremely cost-efficient.
If you need frontier reasoning with convenience, GPT-5 offers great price-to-capability ratios.
Claude Sonnet 4 sits in the middle with premium long-context and no extra per-call fees, simplifying spend models.