Complete Pricing Index for AI Agents

Choosing the right AI agent platform isn’t just about model quality it’s about understanding the real cost of running it at scale. This complete pricing index breaks down up-to-date rates for OpenAI, Anthropic, Amazon Bedrock AgentCore, and open-source stacks, including total cost of ownership (TCO) and ROI scenarios. Whether you’re deploying a high-reasoning GPT-5 agent or optimizing with a lean open-source model, this guide gives you the clarity you need to budget smart and scale with confidence.
What drives agent cost
- Model tokens (input and output)
- Orchestration and tools (web search, code interpreter, browser automation)
- Memory and retrieval (storage and calls)
- Runtime compute for agent frameworks (serverless CPU and RAM billing)
- Observability and other add-ons
OpenAI pricing snapshot
Core model token prices and built-in tool fees that actually move the needle.
GPT-5 family
Model | Input per 1M tokens | Output per 1M tokens | Notes |
---|---|---|---|
GPT-5 | $1.25 | $10.00 | Cached input $0.125 per 1M |
GPT-5 mini | $0.25 | $2.00 | Cached input $0.025 per 1M |
GPT-5 nano | $0.05 | $0.40 | Cached input $0.005 per 1M |
Built-in tools and extras
Tool or feature | Price | Notes |
---|---|---|
Web Search tool | $10.00 per 1K calls | Search content tokens billed at model rate |
File Search tool | $2.50 per 1K calls | Storage $0.10 per GB per day (first GB free) |
Code Interpreter | $0.03 per run | Sandboxed execution |
Anthropic pricing snapshot
Latest Claude models with standard token billing. No separate fee for web search or computer use all charged via tokens.
Model | Input per 1M tokens | Output per 1M tokens | Notes |
---|---|---|---|
Claude Opus 4.1 | $15.00 | $75.00 | Frontier tier |
Claude Opus 4 | $15.00 | $75.00 | |
Claude Sonnet 4 | $3.00 | $15.00 | Best price-performance |
Claude Sonnet 3.7 | $3.00 | $15.00 | Extended thinking |
Tooling notes
- Web search billed via token usage
- Computer use billed via token usage
- Sonnet 4 supports 1M context (beta)
Amazon Bedrock AgentCore pricing
New AWS agent runtime and tool suite. Prices are modular and consumption-based.
AgentCore runtime and tools
Component | Unit price | What it means |
---|---|---|
Runtime CPU | $0.0895/vCPU-hour | Active processing seconds only |
Runtime Memory | $0.00945/GB-hour | 128MB minimum billing |
Browser tool CPU | $0.0895/vCPU-hour | Headless browsing |
Browser tool Memory | $0.00945/GB-hour | |
Code Interpreter CPU | $0.0895/vCPU-hour | JS, TS, Python |
Code Interpreter Memory | $0.00945/GB-hour | |
Gateway InvokeTool | $0.005/1K calls | MCP tool invocations |
Gateway Search API | $0.025/1K calls | Tool discovery |
Tool indexing | $0.02/100 tools/mo | Searchable catalogs |
Identity | $0.010/1K requests | Free via Runtime/Gateway |
Memory short-term | $0.25/1K events | Session context |
Memory long-term store | $0.75/1K/mo | Built-in strategy |
Memory retrieval | $0.50/1K retrievals |
Related Bedrock services
- Prompt Optimization: $0.030 per 1K tokens
- Intelligent Prompt Routing: $1.00 per 1K requests
Note: Bedrock LLM inference rates vary by model and region.
Open-source stack pricing
If you deploy agents on open-weight models, cost = inference provider + orchestration layer.
Fireworks AI serverless
Model class | Price per 1M tokens combined |
---|---|
>16B params (Llama 70B) | $0.90 |
Meta Llama 3.1 405B | $3.00 |
Split pricing models (Llama 4) | $0.22 input / $0.88 output |
Groq
Model | Input per 1M | Output per 1M |
---|---|---|
Llama 3.1 8B Instant 128K | $0.05 | $0.08 |
AI agent platforms pricing comparison
Platform | Model example | Token rates | Extra fees |
---|---|---|---|
OpenAI | GPT-5 | $1.25 in / $10 out | Web Search, File Search, Code Interpreter |
Anthropic | Claude Sonnet 4 | $3 in / $15 out | No extra per-call fees |
Amazon Bedrock | AgentCore runtime | CPU $0.0895, RAM $0.00945 | Gateway, Memory, plus model cost |
Open-source (Fireworks) | Llama 70B | $0.90 combined | Optional caching |
Open-source (Groq) | Llama 3.1 8B | $0.05 in / $0.08 out | None |
TCO calculator example
Agent: 50,000 sessions/month, 3K input + 1K output tokens per session, 10% web search, 5% code interpreter use.
OpenAI GPT-5
- Tokens: $687.50
- Web search: $50.00
- Code interpreter: $75.00
- Total: $812.50
Anthropic Sonnet 4
- Tokens: $1,200.00
- Total: $1,200.00
Amazon Bedrock AgentCore overhead
- Runtime: $14.40
- Gateway: $1.15
- Memory: $21.25
- Overhead total: $36.80 (plus model cost)
Open-source model + AgentCore
- Groq Llama 8B: Tokens $11.50 → ~$48.30 with overhead
- Fireworks Llama 70B: Tokens $180.00 → ~$216.80 with overhead
ROI case
Example: SaaS inbound support, 3,000 tickets/month, 7 min each, $25/hr loaded rate, target 60% deflection.
- Current human cost: $8,750/month
- AI agent cost: $50–$1,200/month depending on stack
- Breakeven deflection with GPT-5 cost: ~9.3%
Guidance by scenario
- Heavy tool use → Bedrock AgentCore often more predictable than per-call fees
- Long context → Sonnet 4 with 1M context reduces RAG calls
- Enterprise controls → AgentCore’s Identity, Memory, Gateway add policy & audit
- SaaS assistant vs bespoke → Compare Amazon Q licensing ($20/user/mo) vs custom
Internal links
Bottom line
If your goal is lowest TCO with acceptable quality, an open-source 8B–70B model with Bedrock AgentCore is extremely cost-efficient.
If you need frontier reasoning with convenience, GPT-5 offers great price-to-capability ratios.
Claude Sonnet 4 sits in the middle with premium long-context and no extra per-call fees, simplifying spend models.