Dev.to
ADR: Adopting Amazon Bedrock AgentCore in Production
After 16 years building financial platforms on AWS, I've learned that the most dangerous question in architecture isn't 'does this work?' — it's 'who operates this at 2 AM when it breaks?' Bedrock AgentCore is AWS's answer to the problem of operationalizing AI agents beyond the notebook: managed runtime, memory, tool-use, guardrails, and traceability in a single control plane. This ADR documents how I arrived at the decision to adopt it — or not — in a regulated financial environment, and the consequences you need to internalize before doing the same.
Context and Forces
Context and Forces
The scenario that motivated this decision is recurring in financial institutions: a product team wants to expose an AI agent to internal analysts — capable of querying market data via API, running risk calculations in Lambda, retrieving context from regulatory documents via RAG, and recording every action in an immutable audit trail. The MVP worked in two sprints with LangChain + Claude via Bedrock. The problem surfaced the following week.
Five forces made the decision urgent: (1) Cross-turn state management — financial agent sessions last minutes, not seconds; reliably maintaining context in stateless Lambda is brittle. (2) Regulatory traceability — every tool call, every model decision, every response must be auditable with timestamp, identity, and full payload, without relying on ad-hoc logging. (3) Guardrails as contract — in finance, the agent cannot leak PII, cannot recommend products without disclaimers, cannot execute irreversible actions without human confirmation. Implementing this manually in every agent is guaranteed technical debt. (4) Unpredictable token cost — without per-session budget control, a faulty agent loop can consume tens of dollars in minutes. (5) Runtime portability — the platform team doesn't want to maintain a custom agent scheduler; they want an SLA contract with AWS.
Options Considered
Option A: Self-hosted LangChain/LangGraph on EKS
Pros
Full control over execution graph and retry logic
Model portability — swap LLM without platform change
Mature ecosystem of community integrations and tools
Cons
Full operational responsibility: scaling, HA, patching, observability
Guardrails and audit trail must be built and maintained by the team
Session memory management requires custom DynamoDB or Redis
High engineering cost to reach parity with managed features
Verdict: Suitable for teams with mature AI platform; high operational risk for smaller teams
Option B: Bedrock Agents (prior generation, without AgentCore)
Pros
AWS-managed, no runtime infrastructure to operate
Native integration with Knowledge Bases and Action Groups
Cons
Limited observability: partial traces, no native span-level detail
No native per-session budget control
Agent loop customization restricted to what AWS exposes
Verdict: Good for simple cases; observability limitations are blockers in finance
Option C: Amazon Bedrock AgentCore
Pros
Managed runtime with native persistent session memory (AgentCore Memory)
Configurable guardrails as declarative policy, not inline code
Native traceability via CloudTrail + X-Ray with tool-call spans
AgentCore Gateway for tool-use with OAuth2/OIDC and per-tool throttling
Configurable per-session token budget control
Cons
Platform lock-in to AWS for the agent runtime
Execution graph customization more restricted than LangGraph
New service: API surface still evolving, conservative quotas
AgentCore Memory and Gateway costs added on top of inference cost
Verdict: Recommended decision for regulated financial environments with a lean platform team
Option D: Step Functions + Lambda as agent orchestrator
Pros
Native audit via Step Functions execution history
Declarative and testable retry, timeout, and error handling
No new service to learn — team already knows the pattern
Cons
Not an agent runtime: each 'turn' requires a new execution or .waitForTaskToken
Session memory and model context must be managed externally
Cold-start and state transition latency can be noticeable in dialogues
Verdict: Excellent for deterministic workflows; inadequate as a conversational agent runtime
The Decision and the Reasoning Behind It
The Decision and the Reasoning Behind It
The decision was to adopt Bedrock AgentCore as the primary agent runtime, with Step Functions as the orchestrator for adjacent deterministic workflows (approvals, reconciliations, notifications). This is not an all-or-nothing decision: AgentCore solves the non-deterministic agent loop problem, while Step Functions remains the right choice for the deterministic business process that wraps the agent.
The decisive argument was the AgentCore Gateway with per-tool OAuth2/OIDC support. In a financial environment, every tool-call is an action with identity: who authorized it, what scope, with which token. Implementing this manually in LangChain would mean building and maintaining an authorization proxy — exactly the kind of infrastructure that generates no business value but generates security incidents when neglected. The Gateway delivers this as declarative configuration, with per-tool throttling (e.g., maximum 10 calls/session for the order execution API) and a native circuit breaker.
The second argument was session memory with configurable TTL. AgentCore Memory persists conversation context in a managed store, with per-session configurable TTL and KMS customer-managed key (CMK) encryption. For LGPD/GDPR compliance, this means I can configure a 24h TTL for analyst sessions and guarantee that no session data persists beyond what's necessary — without building a custom expiration pipeline.
The lock-in trade-off was consciously accepted: the tool-use layer (the Lambda functions that execute the actual actions) remains completely portable. If we need to migrate the runtime in the future, the tools keep working.
Financial Agent Architecture with Bedrock AgentCore
Execution flow of a financial analysis agent: from analyst to AgentCore runtime, through guardrails, tool-use via Gateway, session memory, and observability
🔐 AWS — Segurança & Entrada
API Gateway REST + Cognito JWT (edge)
Bedrock Guardrails PII filter + topic deny (security)
🤖 AWS — AgentCore Runtime
AgentCore Runtime Claude 3.5 Sonnet (ai)
AgentCore Memory TTL=24h, KMS CMK (storage)
AgentCore Gateway OAuth2/OIDC, throttle (security)
⚙️ AWS — Ferramentas (Tool-use)
Lambda: Market Data Bloomberg API proxy (compute)
Lambda: Risk Calc VaR engine (compute)
Knowledge Base OpenSearch + S3 (data)
📊 AWS — Observabilidade & Auditoria
X-Ray span por tool-call (external)
CloudTrail API audit log (storage)
CloudWatch SLO dashboards (external)
Flows
analyst -> apigw: HTTPS + JWT
apigw -> guardrails: input validation
guardrails -> agentcore: sanitized prompt
agentcore -> memory: read/write context
agentcore -> gateway: tool invocation
gateway -> lambda_market: OAuth2 token
gateway -> lambda_risk: OAuth2 token
gateway -> kb: RAG query
agentcore -> guardrails: output filter
agentcore -> xray: traces
apigw -> cloudtrail: API events
xray -> cw: SLO metrics
Concrete Configuration: What Actually Matters
Concrete Configuration: What Actually Matters
Adopting AgentCore without properly configuring operational controls is worse than not adopting it — you gain a false sense of security without active guardrails. Here are the configurations that make a real difference:
Guardrails as first line: Configure contentPolicyConfig with HATE, INSULTS, SEXUAL, VIOLENCE all set to BLOCK, and sensitiveInformationPolicyConfig with PII filters for CREDIT_DEBIT_CARD_NUMBER, AWS_ACCESS_KEY, NAME, and EMAIL in ANONYMIZE mode. In a financial environment, add topicPolicyConfig with explicitly denied topics: "investment advice without disclaimer", "guaranteed returns". This isn't paranoia — it's the minimum to pass a compliance review.
AgentCore Memory with correct partitioning: The memory partition key must be userId + sessionId, never just sessionId. In multi-tenant environments, sessions from different users with the same sessionId collided in testing — a silent bug that leaks context between users. Configure memoryConfiguration.enabledMemoryTypes with SESSION_SUMMARY for long sessions, reducing context token consumption by up to 40% in sessions exceeding 20 turns.
Gateway with per-tool throttling: Define separate rateLimit for each Action Group. The order execution API should have maxRequestsPerSession: 5 and requireConfirmation: ENABLED. The market data query API can have maxRequestsPerSession: 50. Without this granularity, a faulty agent loop can execute dozens of orders before being detected — a scenario I've seen happen in production with frameworks lacking tool-use controls.
Per-session token budget: Configure sessionConfiguration.maxTokens with a conservative initial value — I recommend 50,000 tokens for typical analysis sessions. Monitor the p95 token consumption per session in CloudWatch and adjust. An agent entering a reasoning loop can consume 200k+ tokens in a single session without this control.
Observability: What to Measure and How
Observability: What to Measure and How
AI agents have a different observability profile from traditional APIs. p99 latency is less useful than turns-per-session distribution and tool-call failure rate per tool. Here is the observability model I implemented:
Agent business metrics (via CloudWatch custom metrics with namespace FinancialAgent):
TurnsPerSession — histogram; alert if p95 > 15 turns (indicates loop or poorly calibrated prompt)
TokensPerSession — histogram; alert if p95 > 40k tokens
ToolCallFailureRate per ToolName — counter; SLO of < 1% failure for critical tools
GuardrailInterventionRate — counter; spike indicates jailbreak attempt or prompt injection
Traces with X-Ray: AgentCore emits spans for each tool invocation with attributes bedrock.agent.toolName, bedrock.agent.sessionId, and bedrock.agent.turnCount. Configure a trace group with filter annotation.bedrock.agent.toolName = "ExecuteOrder" and alert on latency > 2s — order execution above that indicates a downstream API issue.
CloudTrail for regulatory audit: Each InvokeAgent API call is recorded with the caller ARN, sessionId, and inputText (truncated). For compliance, configure an S3 bucket with Object Lock in COMPLIANCE mode and 7-year retention for AgentCore CloudTrail logs. This is the minimum to meet Banco Central do Brasil and SEC audit requirements.
Cost anomaly alarm: Configure an AWS Budget with an alert at 80% of the monthly Bedrock budget, with an SNS action. Add a second CloudWatch alarm on bedrock:InvokeModel with model-id=anthropic.claude-3-5-sonnet and a threshold of 1,000 invocations/hour — above that, something is wrong.
Consequences and Risks You Need to Accept: Runtime lock-in is real: If AWS deprecates or significantly changes the AgentCore API, migration requires rewriting the orchestration logic — not just the tools. Mitigate by keeping tools (Lambda) completely runtime-agnostic and documenting the interface contract in a separate ADR.
Conservative quotas on a new service: AgentCore has concurrent agent sessions quotas that, at launch, were significantly lower than traditional Bedrock Agents quotas. Request quota increases before go-live, not after. A peak event without adequate quota results in ThrottlingException that the end client sees as a timeout.
Guardrails have latency: Each pass through Guardrails adds 100-300ms of latency. In an agent with 10 turns, that's up to 3 additional seconds of accumulated latency. For use cases where latency is critical, consider disabling output guardrails on internal tools (not exposed to the end user) and applying them only on the final output.
Memory is not free: AgentCore Memory charges for storage and per read/write operation. In long sessions with SESSION_SUMMARY active, memory cost can exceed inference cost for short sessions. Monitor MemoryReadLatency and MemoryWriteLatency — above 200ms indicates pressure on the managed store.
Human-in-the-loop is not automatic: requireConfirmation: ENABLED on the Gateway pauses execution and waits for confirmation via callback. If the client doesn't respond within confirmationTimeout (default: 300s), the session expires. Design the UX to make this clear to the user — timeout-expired sessions are the leading cause of complaints in financial agents.
Real Reference Numbers
~40% — Token reduction with SESSION_SUMMARY. In sessions with more than 20 turns, SESSION_SUMMARY reduces context sent to the model by ~40% vs. full history
100-300ms — Latency added by Guardrails per turn. Each Guardrail evaluation (input + output) adds 100-300ms; across 10 turns, up to 3s accumulated
7 anos — Minimum CloudTrail retention for financial compliance. S3 Object Lock in COMPLIANCE mode with 7 years meets Banco Central do Brasil and SEC requirements for agent audit trails
Well-Architected Assessment
security: Declarative guardrails with PII filter and denied topics; AgentCore Gateway with per-tool OAuth2/OIDC; KMS CMK for session memory; CloudTrail with S3 Object Lock for immutable audit. IAM with bedrock:AgentArnLike condition to restrict which agents can invoke which tools.
reliability: Automatic retry with jitter in the Bedrock SDK (max_attempts=3, mode=adaptive); native circuit breaker in AgentCore Gateway per tool; configurable session timeout prevents zombie sessions; concurrent session quotas must be requested before go-live.
performance: SESSION_SUMMARY reduces context tokens by ~40% for long sessions; disabling output guardrails on internal tools reduces accumulated latency; Knowledge Base with OpenSearch k-NN with HNSW and ef_search=512 for low-latency RAG.
cost: AWS Budget with alert at 80% of monthly limit; CloudWatch alarm on invocations/hour per model-id; SESSION_SUMMARY reduces inference cost in long sessions; monitor AgentCore Memory cost separately from inference cost.
What the AWS Blog Doesn't Tell You
What the AWS Blog Doesn't Tell You
AWS service launch blogs are excellent at showing the happy path. What they rarely cover are the edge cases you only discover in production. Here are the three that cost me the most time:
Tool-call idempotency is not guaranteed by the runtime. If AgentCore attempts to invoke a tool and receives a timeout, it may retry — and your Lambda may be invoked twice for the same action. For idempotent tools (queries), this is harmless. For tools with side effects (order execution, email sending), you need to implement idempotency in the Lambda using an idempotencyToken derived from sessionId + turnId + toolName. Without this, order duplication is a matter of when, not if.
The model can ignore requireConfirmation in certain prompt formulations. I tested this: if the system prompt instructs the agent to "be proactive and execute actions without asking for unnecessary confirmation," the model may rationalize that a specific action doesn't need confirmation even with the flag active. The correct defense is dual: the flag on the Gateway and an explicit instruction in the system prompt about when confirmation is mandatory. Never rely on a single layer.
AgentCore doesn't have native multi-agent support yet. If your architecture requires a supervisor agent delegating to specialized agents (multi-agent orchestration pattern), you'll need to implement the delegation logic manually — typically with an agent that calls other agents via tool-use, where each "tool" is actually an invocation of another AgentCore. It works, but cross-session traceability requires manual sessionId correlation via X-Ray.
Anti-Patterns I've Seen in Architecture Reviews
Using AgentCore without configuring Guardrails because "it's an internal environment" — insiders are the primary source of compliance incidents in finance
Storing full session history in memory without SESSION_SUMMARY — token cost grows linearly with number of turns
Implementing critical business logic inside the agent system prompt instead of in testable tools — prompts don't have unit tests
Not requesting concurrent session quota increase before go-live — ThrottlingException during peak usage is predictable and preventable
Assuming AgentCore Gateway replaces a business authorization layer — the Gateway controls access to the tool, not the authorization logic inside the tool
Not implementing idempotency in tool Lambdas with side effects — runtime retries can duplicate irreversible actions
Curator's Note: In practice, what convinced me to adopt AgentCore was not any individual feature — it was the fact that the AgentCore Gateway with per-tool OAuth2/OIDC solves the tool-call identity problem I was about to build manually, which would have taken two sprints and generated permanent technical debt. The hard-won lesson behind this: in financial environments, the cost of building custom security controls is not the initial development cost — it's the cost of maintaining, auditing, and fixing those controls over years. When a managed service delivers the control as declarative configuration, the decision to adopt it is rarely about feature parity; it's about where you want to allocate your team's engineering attention. My recommendation: adopt AgentCore for new production agents, keep tools portable, and invest the saved time in observability and adversarial prompting tests.
Verdict: Adopt with Explicit Controls
Bedrock AgentCore is the right choice for financial teams that need to put AI agents into production without building and maintaining a custom orchestration runtime. The decision is not binary — it's about recognizing that AgentCore's value lies in the operational controls (Gateway, Guardrails, Memory with CMK), not just the execution runtime. The condition for adoption is clear: configure Guardrails before any testing with real data, implement idempotency in all tools with side effects, request concurrent session quota increases before go-live, and monitor TurnsPerSession and TokensPerSession as first-class SLO metrics. Lock-in is real but manageable if tools are kept portable. For teams that lack the capacity to build and operate a custom agent runtime — which is most teams — AgentCore is the correct architectural decision in 2025.
References
Amazon Bedrock AgentCore — Developer Guide
Amazon Bedrock Guardrails — Configuration Reference
Amazon Bedrock AgentCore Memory — Developer Guide
Building AI agents with Amazon Bedrock AgentCore — AWS ML Blog
AWS Well-Architected Framework — Machine Learning Lens
Idempotency for AWS Lambda — Powertools for AWS Lambda
Architecture Decision Records — Michael Nygard
Amazon OpenSearch Service — k-NN Search with HNSW
Originally published at fernando.moretes.com. By Fernando F. Azevedo — Senior Solutions Architect.
18 hours ago