Which memory model is most cost-efficient in AgentCore (ephemeral, persistent, or shared)?

Ephemeral memory is cheapest but requires concise state design. Persistent per-user memory drives recurring read/write and storage costs and should be reserved for high-value journeys. Shared/team memory is a good middle ground if governed (TTL, size caps, delta updates).

How do VPC endpoints and AZ placement impact my AgentCore bill?

PrivateLink endpoints are billed per hour per AZ. Placing endpoints, vector DB, and storage in the same AZ reduces cross-AZ traffic. Start with a single-AZ footprint and scale out only when concurrency or HA requirements demand it.

How can I limit cross-AZ traffic and surprise egress charges?

Co-locate agents, retrieval, caches, and storage in one AZ, collapse multiple small retrieval calls into one ranked call, and avoid chatty micro-calls across services. Use VPC endpoints for S3/Bedrock to remove public egress.

What logging level is audit-safe without exploding costs?

Sample traces at 5–10% in production, store full payloads only for audit windows or incidents, and mask PII at log-time. Keep short retention on verbose logs and longer retention on policy/consent events.

How do I estimate the incremental cost of persistent memory per user?

Model read/write ops per session, storage growth (GB/month), and periodic embedding refresh. Apply delta-sync for updates and enforce chunk/size caps. Add 10–20% headroom for retries and safety rails.

How can Scalevise help with this?

Scalevise helps businesses turn complex digital challenges into scalable solutions. Whether you're facing compliance, automation, integration, or innovation hurdles—our team delivers custom strategies and implementations that work. Contact us at contact@scalevise.com or call +31-6-27178770.

AgentCore Memory & VPC Costs: The Missing Line Items

4 min read

AgentCore Memory & VPC Costs

TL;DR
Most AgentCore bills are driven by memory footprint and VPC design rather than pure model tokens. Make memory explicit (ephemeral vs persistent), keep endpoints and data in the same AZ, sample logs, and cap tool/retrieval steps. If traffic is stable, a hybrid with self-hosting can drop cost per turn.

1) Pricing quick view (what actually shows up on the bill)

Cost category	What it includes	Why it spikes
Agent orchestration	Planning steps, tool invocations, guardrails	Unbounded tool loops, retries, multi-hop tasks
Model inference	Tokens for your chosen model(s)	Long prompts, verbose history, no function calling
Retrieval & embeddings	Vector search, embedding generation, storage	Over-retrieval, indexing everything, no delta sync
Memory & session state	Bedrock session state, vector store, Dynamo/S3	Persistent per-user memory with frequent reads/writes
VPC endpoints	PrivateLink endpoints, endpoint hours per AZ	Endpoints in multiple AZs or Regions
Cross-AZ / egress	Data moving between AZs or out of VPC	Chatty architectures crossing services/AZs
Observability & safety	Logs, traces, content filters	Verbose tracing in hot paths

Key insight: Memory and VPC choices multiply each other. A persistent memory model with multi-AZ endpoints can cost more than the model tokens.

2) Memory models and their cost footprint

a) Ephemeral per run

Memory lives only during a single agent turn or session.
Lowest storage/read cost. Requires good prompt/state design.
Use for: transactional Q&A, deterministic tasks.

b) Persistent per user

Memory follows a user across sessions.
Costs include storage, reads/writes, and embedding refresh.
Use for: account context, multi-step sales cycles, support history.

c) Shared/team memory

Topics, FAQs, product snippets shared across users.
Cheaper than per-user memory but needs governance to avoid bloat.

Cost controls for memory

Summarize conversation history aggressively.
Store facts, not full transcripts.
Batch embeddings nightly; use delta sync (only new/changed).
Cap chunk size and total tokens per document.
Add a TTL on low-value memory to auto-expire.

3) VPC design that avoids surprise charges

Place services together

Keep AgentCore endpoints, vector DB, and storage in the same AZ.
Minimize hops through NAT; prefer VPC endpoints for S3/Bedrock.

Right-size PrivateLink

Each endpoint has hourly cost per AZ.
Start with one AZ, scale out when concurrency demands it.
Avoid “just in case” endpoints in three AZs if your app runs in one.

Limit cross-AZ chat

Co-locate agents, DB, caches, and app workers.
Consolidate retrieval into one call with top-k chunks rather than multiple micro-calls across services.

Observability discipline

Sample traces (for example 5–10%).
Keep full payload logs only for audit windows or incident analysis.

4) Worked example: a transparent monthly bill (plug your own unit prices)

Scenario: 50k sessions/month, each with 2 model calls, 1 retrieval, and persistent per-user memory.

Line item	Driver (example)	Volume	Unit price (fill in)	Subtotal
Orchestration steps	5 steps/session	250,000 steps	___ / 1k steps	=
Model inference	2 calls × N tokens	100,000 calls	___ / 1k tokens	=
Retrieval searches	1.2 per session	60,000 searches	___ / 1k	=
Embedding generation	Nightly delta sync	3M tokens	___ / 1k tokens	=
Vector storage	1.5M vectors	—	___ / GB-month	=
VPC endpoints	1 endpoint × 720h	—	___ / hour	=
Cross-AZ traffic	8% of traffic	—	___ / GB	=
Logs & traces	10% sampled, gzip	—	___ / GB	=
Estimated monthly total				Σ

How to use this table

Fill unit prices for your Region.
Run three mixes: baseline, peak week, launch day.
Add 10–20% headroom for retries and safety.
Recompute after each optimization to prove savings.

5) A simple cost model you can copy

Let:

S = sessions/month
k = model calls per session
r = retrieval calls per session
m_read/m_write = memory read/write ops per session
E_h = VPC endpoint hours
X_az = GB cross-AZ

Total cost ≈

Orchestration(S)

ModelTokens(S × k × avg_tokens)
Retrieval(S × r)
Embeddings(index_tokens_month)
Memory(m_read, m_write, storage_GB)
VPC(E_h)
CrossAZ(X_az)
Logs(log_GB × sample_rate)

Sensitivity

Reduce k and r with function calling and better retrieval.
Cut avg_tokens via prompt/state compression.
Lower E_h and X_az by consolidating in one AZ.
Drop log_GB with sampling and short retention.

6) How to cut the bill without hurting outcomes

Routing tier before the agent: send low-stakes turns to a smaller/cheaper model, reserve the flagship for high-value paths.
Step budgets: cap planning/tools per session and fail fast with a graceful reply.
Prompt/state compression: summarize aggressively; switch to function calling for structured steps.
Retrieval hygiene: pre-rank, top-k, no duplicate chunks; delta sync for updates.
Response caching: cache static answers and policy snippets.
Same-AZ placement: pin endpoints, DB, and app workers to one AZ.
Observability sampling: 5–10% traces in prod, 100% only during incidents.

7) When self-hosting beats AgentCore

Self-hosting can win when:

Traffic is stable and high.
You need custom policies or models outside Bedrock.
Data residency mandates private cloud/on-prem.
You want aggressive optimizations: quantization, speculative decoding, shared caches.

Break-even intuition
If C_b = Bedrock cost/1k turns and C_s = self-hosted cost/1k turns, then self-hosting is attractive when:

Monthly_turns × (C_b − C_s) > switch_costs / payback_months

Many teams reach payback in 3–6 months at sustained volumes.

8) Implementation checklist

[ ] Define RPM and max concurrency for baseline and peak.
[ ] Choose memory model: ephemeral, per user, or team.
[ ] Put a step budget on planning & tools.
[ ] Add a routing tier for cheap vs premium paths.
[ ] Place services in one AZ and use VPC endpoints.
[ ] Batch embeddings; enable delta sync.
[ ] Sample traces; short retention on non-audit logs.
[ ] Build a TCO sheet with your Region prices.
[ ] Run a one-week A/B: AgentCore vs self-hosted path.
[ ] Review results and lock in optimizations.

9) FAQs

What usually drives costs most?
Persistent memory with frequent reads/writes, and VPC design (endpoint hours, cross-AZ traffic). Tokens matter, but plumbing wins or loses the month.

How do I reduce retrieval cost?
Pre-rank, use smaller chunks, cap top-k, and remove duplicates. Batch indexing and update via delta sync.

How do I know if I need multi-AZ?
Only if you require high availability across AZ failures. Otherwise, single-AZ with fast recovery is cheaper and simpler.

10) Let’s make your bill predictable

If you want a second set of eyes on your architecture:

Bedrock Cost Architecture Review

We map your cost path, model a baseline/peak/launch bill, and deliver a concrete cut-plan for memory, VPC and retrieval.
Typical outcome: −20–30% cost per turn in 30 days without quality loss.

Discover New (AI) Tools

AgentCore Memory & VPC Costs: The Missing Line Items

1) Pricing quick view (what actually shows up on the bill)

2) Memory models and their cost footprint

3) VPC design that avoids surprise charges

4) Worked example: a transparent monthly bill (plug your own unit prices)

5) A simple cost model you can copy

6) How to cut the bill without hurting outcomes

7) When self-hosting beats AgentCore

8) Implementation checklist

9) FAQs

10) Let’s make your bill predictable

Follow Us

Navigation

Solutions

Tools

Popular