AWS Bedrock AgentCore and multi-model governance for regulated production AI

AI-Generated Summary

1 sources

2 hours ago

1 views

AWS Bedrock AgentCore and multi-model governance for regulated production AI

Key Points

Bedrock is used to centralize governance via IAM, CloudTrail logging, and Bedrock Guardrails across different frontier models and agent components.
Bedrock AgentCore is presented as a managed agent runtime, providing persistent session memory, declarative guardrails, and traceability (including tool-call spans) to support regulated environments.
Both articles emphasize observability that links model/agent activity to business context, using trace IDs (OpenTelemetry) and Bedrock Model Invocation Logging for auditability and debugging.
Operational safeguards recommended include per-session token budgets, per-tool throttling, and requesting service quotas/capacity in advance to avoid throttling during peak use.
For tools with side effects in agent workflows, the first article calls for application-level idempotency to handle runtime retries and prevent duplicate actions.

Across the two articles, the core theme is that regulated organizations need production controls—security, traceability, and cost governance—rather than focusing only on model performance. In the agent setting, the first article describes adopting Amazon Bedrock AgentCore in a production financial environment to manage the “agent loop” safely. It emphasizes managed runtime features such as persistent session memory, declarative guardrails for PII/topic handling, and traceability integrated with AWS tooling (CloudTrail and X-Ray tool-call spans). It also notes required operational practices: configuring guardrails before testing, implementing idempotency for tool side effects, setting per-session token budgets, requesting sufficient concurrency quotas, and using per-tool throttling and human confirmation where appropriate. Step Functions is positioned as an orchestrator for deterministic workflows around the agent, not as a conversational agent runtime.

In the second article, the focus shifts to governing multiple frontier models (GPT-5.5 via Bedrock, Claude 3.7, and Amazon Nova Pro) under a unified control plane. It highlights how Bedrock allows IAM, CloudTrail, and Guardrails to apply consistently, while operational gaps remain around latency, quota planning (TPM limits), structured output validation, and correlating model calls to business context using OpenTelemetry trace IDs and Bedrock Model Invocation Logging.

How Outlets Covered This Story

DEV

Dev.to

GPT-5 vs Claude vs Nova on Bedrock: A Production Governance Bake-off

The arrival of GPT-5.5, GPT-5.4, and Codex on Amazon Bedrock is not just a product event — it is a signal that Bedrock is consolidating as the unified control plane for frontier models in enterprise environments. For teams operating in regulated sectors, the question has shifted from 'which model to use?' to 'how do we govern multiple frontier models with the same security, traceability, and cost controls we already apply to the rest of our AWS infrastructure?' This analysis does exactly that bake-off: GPT-5.5 vs Claude 3.7 Sonnet vs Amazon Nova Pro, focused on production, not benchmarks. What changed when GPT-5 landed on Bedrock Before OpenAI models arrived on Bedrock, choosing GPT-4 or GPT-4o meant leaving the AWS perimeter: direct calls to the OpenAI API, secrets managed outside Secrets Manager, logs that bypassed CloudTrail, and data potentially leaving your residency region. For teams requiring LGPD, PCI-DSS, or SOC 2 compliance, that was a real governance cost, not a theoretical one. With GPT-5.5 and Codex available via bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream, the model becomes just another ARN resource. That means the IAM policies you already have — including conditions like aws:RequestedRegion, bedrock:modelId, and aws:PrincipalTag — apply directly. CloudTrail records every invocation. Amazon Bedrock Guardrails, with its content filters, PII detection, and grounding checks, covers GPT-5.5 the same way it covers Claude or Nova. What this does not solve: network latency to regions where the model is still served via cross-region endpoints, and the fact that GPT-5.5 weights do not reside in your account — you are consuming a hosted model, not a deployed one. For use cases requiring full inference isolation, such as document analysis with classified customer data, this remains a threat model item that needs explicit documentation. The dimension benchmarks miss: operational behavior Academic benchmarks measure capability under controlled conditions. In financial production, what matters is behavior under load, p99 latency consistency, and the real cost of a response — not the average cost, but the cost of an 8k-token prompt with 2k output at peak hours. Claude 3.7 Sonnet has a characteristic that matters greatly for agentic workflows: extended thinking mode produces chained reasoning that is auditable. In compliance contexts, being able to show the intermediate reasoning of a credit decision or fraud triage has direct regulatory value. GPT-5.5 also supports chain-of-thought, but the level of control over reasoning verbosity and the separation between scratchpad and final output is still less granular via the Bedrock API than what Anthropic exposes natively. Amazon Nova Pro, on the other hand, is the only one of the three where you have full visibility into the model lifecycle within AWS. It supports fine-tuning via Bedrock Custom Model Jobs, meaning you can adapt the model to domain-specific vocabulary — derivatives terminology, for example — without relying on prompt engineering. Nova Pro's cost per token is significantly lower, which matters when you are processing millions of documents in batch with Bedrock Batch Inference. The most common failure mode I see in production is not the model being wrong — it is the system lacking sufficient observability to know when the model was wrong. That leads directly to the instrumentation question. Unified Control Plane: Model Governance on Bedrock Inference request flow through Bedrock governance layers, showing how GPT-5.5, Claude, and Nova share the same security and observability controls 🔐 AWS — Segurança e Identidade IAM Policy bedrock:modelId condition (security) KMS Encrypt at rest / in transit (security) 🟧 Amazon Bedrock — Plano de Controle Bedrock Guardrails PII, content, grounding (security) Bedrock API Gateway InvokeModel / Stream (compute) Model Invocation Logging S3 + CloudTrail (data) 🤖 Modelos Frontier GPT-5.5 / Codex OpenAI via Bedrock (ai) Claude 3.7 Sonnet Extended Thinking (ai) Amazon Nova Pro Fine-tune + Batch (ai) 📊 Observabilidade CloudWatch Latency P99 / Tokens (data) OpenTelemetry Trace ID por invocação (data) Flows client -> iam: authentication iam -> guardrails: policy enforced guardrails -> gateway: filtered prompt gateway -> gpt5: InvokeModel gateway -> claude: InvokeModel gateway -> nova: InvokeModel gateway -> logging: async log kms -> logging: encryption logging -> cw: metrics gateway -> otel: trace span Instrumentation: where most teams get it wrong Bedrock emits native metrics to CloudWatch: InvocationLatency, InputTokenCount, OutputTokenCount, InvocationClientErrors, InvocationThrottles. But these metrics alone are insufficient to operate an AI system in financial production. What is missing is correlation between the model invocation and business context — which user, which product, which decision was influenced by that response. The approach that works is instrumenting with OpenTelemetry at the application level, propagating a trace ID that crosses the Bedrock call and is included in the Model Invocation Logging payload. When you enable Model Invocation Logging with S3 + CloudWatch Logs as destination, each record includes the Bedrock requestId. If you inject that requestId as an attribute in your OTel span, you can correlate a customer complaint with the exact prompt and response that generated that decision — that is real auditability. For GPT-5.5 specifically, one watch point: the model supports response_format: json_object and structured outputs, but schema validation happens on the model side, not in Guardrails. If you need to guarantee that the response respects a specific schema before persisting to DynamoDB, add a validation step in the Lambda that processes the response — do not assume the model will always return valid JSON under load or with adversarial prompts. Claude 3.7 with extended thinking exposes the reasoning block as a separate field in the response. Store that field in S3 with a 7-year retention policy if you are in a regulated environment — it is decision-making evidence, not just a technical log. Real cost: beyond price per token Frontier model cost comparisons frequently stop at input/output token price. That is the smallest component of total cost in production systems. The components that dominate cost are: (1) tokens wasted by poorly structured prompts, (2) retries due to throttling, and (3) the cost of operating the system around the model. GPT-5.5 has a higher price per token than Claude 3.7 Sonnet and significantly higher than Nova Pro. For a document analysis workload processing 10 million pages per month with an average context of 4k tokens per page, the cost difference between GPT-5.5 and Nova Pro can be on the order of 5-8x. This is not an argument against using GPT-5.5 — it is an argument for using it selectively, in cases where its differentiated reasoning capability justifies the cost. Bedrock Batch Inference changes the calculation for async workloads. With batch, you get up to 50% discount on token price for Claude and Nova. GPT-5.5 on Bedrock does not yet support batch inference at the time of this analysis — meaning that for large-scale processing, you need to manage your own queue (SQS + Lambda with reserved concurrency) and handle the account-level TPM (tokens per minute) limits. Bedrock TPM limits for third-party models like GPT-5.5 are managed via service quota, and increases require an AWS Support request. In multi-tenant environments where multiple products share the same AWS account, this can become a bottleneck. The solution is to use AWS Organizations with separate accounts per product and independent quotas — do not share TPM limits between critical and experimental workloads. GPT-5.5 vs Claude 3.7 Sonnet vs Amazon Nova Pro — Technical Comparison Criterion Dimension GPT-5.5 (OpenAI via Bedrock) Claude 3.7 Sonnet (Anthropic) Amazon Nova Pro Relative cost per token (input) High (baseline ~$3/MTok) Medium (~$3/MTok) Low (~$0.8/MTok) — Batch Inference support (Bedrock) No (at time of analysis) Yes — up to 50% discount Yes — up to 50% discount — Fine-tuning via Bedrock Not available Not available Yes — Custom Model Jobs — Auditable reasoning (structured CoT) Partial — via structured outputs Yes — separate extended thinking block Partial — via prompt engineering — Bedrock Guardrails coverage Yes — same controls Yes — same controls Yes — same controls — P50 latency (2k token prompt) ~1.8s (estimate; varies by region) ~1.5s (without extended thinking) ~1.2s — Model weight residency in AWS account No — hosted by OpenAI No — hosted by Anthropic Yes — Amazon-native — Codex / specialized code generation Yes — Codex available on Bedrock Strong — Claude 3.7 is top-tier for code Competent — best with fine-tuning — Decision Matrix: Which Model for Which Workload? GPT-5.5 via Bedrock Pros Top-tier reasoning capability for complex, ambiguous tasks Codex for code generation in AI-assisted DevOps pipelines Unified governance via IAM, CloudTrail, and Guardrails — no AWS perimeter exit Structured outputs with JSON schema for direct downstream system integration Cons Higher cost per token; no batch inference support on Bedrock Weights do not reside in AWS account — implications for sensitive data threat models TPM limits managed via service quota; increases require AWS Support No fine-tuning available via Bedrock Verdict: Best for: high-complexity reasoning tasks (legal analysis, due diligence), code generation in CI/CD pipelines, and cases where response quality justifies the premium cost. Claude 3.7 Sonnet Pros Extended thinking with separate, auditable reasoning block — direct regulatory value Batch inference support with up to 50% discount for async workloads Excellent at code and technical analysis; consistent in long contexts Competitive pricing with GPT-5.5 for comparable quality on many tasks Cons Extended thinking significantly increases latency — not suitable for real-time inference No fine-tuning via Bedrock; adaptation relies on prompt engineering and RAG Weights also do not reside in AWS account Verdict: Best for: agentic workflows requiring auditable reasoning, regulatory document analysis, fraud triage with explainability, and high-quality batch processing. Amazon Nova Pro Pros Lowest cost per token — 5-8x cheaper than GPT-5.5 for high-volume workloads Fine-tuning via Bedrock Custom Model Jobs — domain adaptation without prompt engineering Amazon-native weights; best posture for ultra-sensitive data threat models Batch inference support; lowest P50 latency among the three Cons Reasoning capability below GPT-5.5 and Claude 3.7 on high-complexity tasks Fine-tuning requires quality dataset and MLOps pipeline — operational overhead Smaller third-party tool and integration ecosystem Verdict: Best for: large-scale processing (millions of documents), classification and extraction tasks where fine-tuning pays off, and environments with stricter data sovereignty requirements. The routing pattern that resolves the dilemma: The answer to 'which model to use?' in mature production systems is not a single choice — it is a router. Implement an AI Gateway in Lambda or ECS that classifies each request by complexity, data sensitivity, and latency requirement, and routes to the appropriate model. Low-complexity, high-volume requests go to Nova Pro. Analyses requiring auditable reasoning go to Claude 3.7 with extended thinking. Code generation in CI/CD pipelines goes to Codex. Same Guardrails, same CloudTrail, same trace ID — unified governance with workload-optimized cost. This pattern reduces total cost by 40-60% compared to using GPT-5.5 for everything, without sacrificing quality where it matters. Numbers that guide the routing decision 5-8x — Cost per token difference: GPT-5.5 vs Nova Pro. For high-volume workloads, intelligent routing is the largest cost lever available on Bedrock 50% — Maximum discount with Batch Inference (Claude and Nova). Async workloads — document analysis, data enrichment — should use batch by default 7 anos — Recommended retention for reasoning logs in regulated environments. Claude 3.7's extended thinking block is decision-process evidence for regulatory audits Anti-patterns I encounter in production Using GPT-5.5 for all workloads because it is 'the most capable' — ignores that 70% of tasks do not need frontier reasoning and pays 5-8x more for it Not enabling Model Invocation Logging — without prompt/response logs, regulatory audit is impossible and quality regression debugging is blind Assuming Bedrock Guardrails replaces schema validation in code — Guardrails filters content, not data structure; invalid JSON still passes through Sharing TPM limits between critical and experimental workloads in the same AWS account — a token-burst experiment can throttle a production feature Not propagating trace IDs in Bedrock calls — loses the correlation between business decision and model invocation, making incident investigations much slower My curation note: In practice, what I would do: start with Claude 3.7 Sonnet as the default model for any new workload in a financial environment — the auditable extended thinking is worth more than the cost difference versus Nova Pro when you are in a regulated sector. Introduce GPT-5.5 and Codex specifically for the AI-assisted DevOps pipeline, where code generation quality justifies the premium cost. Nova Pro would enter as a routing destination for large-scale classification and extraction, with fine-tuning trained on domain vocabulary. The lesson I learned the hard way: the biggest risk is not choosing the wrong model — it is not having sufficient observability to know when any model is wrong. Invest in trace IDs, Model Invocation Logging, and quality drift alerts before optimizing which model to use. Recommendation: do not choose a model, build a router The arrival of GPT-5.5 and Codex on Bedrock does not make Claude or Nova obsolete — it completes the portfolio. The recommendation is clear: implement an AI Gateway that routes by workload, not by model preference. Use Claude 3.7 Sonnet as the default for tasks requiring auditable reasoning in regulated environments. Use GPT-5.5 and Codex for code generation and high-complexity reasoning tasks where the premium cost is justified by value. Use Nova Pro for large-scale processing and cases where domain fine-tuning is viable. In all cases: enable Model Invocation Logging on day one, propagate trace IDs, and treat TPM limits as an infrastructure quota requiring capacity planning — not a configuration detail. Unified governance on Bedrock is the real asset here; the models are commodities that will evolve. Build your platform around the controls, not around a specific model. References Amazon Bedrock Model Invocation Logging Amazon Bedrock Guardrails Amazon Bedrock Batch Inference Amazon Bedrock Custom Model Fine-tuning AWS Well-Architected — Machine Learning Lens Amazon Bedrock Service Quotas OpenTelemetry for AWS Lambda and Bedrock AWS News Blog — Amazon Bedrock Originally published at fernando.moretes.com. By Fernando F. Azevedo — Senior Solutions Architect.

18 hours ago

DEV

Dev.to

ADR: Adopting Amazon Bedrock AgentCore in Production

After 16 years building financial platforms on AWS, I've learned that the most dangerous question in architecture isn't 'does this work?' — it's 'who operates this at 2 AM when it breaks?' Bedrock AgentCore is AWS's answer to the problem of operationalizing AI agents beyond the notebook: managed runtime, memory, tool-use, guardrails, and traceability in a single control plane. This ADR documents how I arrived at the decision to adopt it — or not — in a regulated financial environment, and the consequences you need to internalize before doing the same. Context and Forces Context and Forces The scenario that motivated this decision is recurring in financial institutions: a product team wants to expose an AI agent to internal analysts — capable of querying market data via API, running risk calculations in Lambda, retrieving context from regulatory documents via RAG, and recording every action in an immutable audit trail. The MVP worked in two sprints with LangChain + Claude via Bedrock. The problem surfaced the following week. Five forces made the decision urgent: (1) Cross-turn state management — financial agent sessions last minutes, not seconds; reliably maintaining context in stateless Lambda is brittle. (2) Regulatory traceability — every tool call, every model decision, every response must be auditable with timestamp, identity, and full payload, without relying on ad-hoc logging. (3) Guardrails as contract — in finance, the agent cannot leak PII, cannot recommend products without disclaimers, cannot execute irreversible actions without human confirmation. Implementing this manually in every agent is guaranteed technical debt. (4) Unpredictable token cost — without per-session budget control, a faulty agent loop can consume tens of dollars in minutes. (5) Runtime portability — the platform team doesn't want to maintain a custom agent scheduler; they want an SLA contract with AWS. Options Considered Option A: Self-hosted LangChain/LangGraph on EKS Pros Full control over execution graph and retry logic Model portability — swap LLM without platform change Mature ecosystem of community integrations and tools Cons Full operational responsibility: scaling, HA, patching, observability Guardrails and audit trail must be built and maintained by the team Session memory management requires custom DynamoDB or Redis High engineering cost to reach parity with managed features Verdict: Suitable for teams with mature AI platform; high operational risk for smaller teams Option B: Bedrock Agents (prior generation, without AgentCore) Pros AWS-managed, no runtime infrastructure to operate Native integration with Knowledge Bases and Action Groups Cons Limited observability: partial traces, no native span-level detail No native per-session budget control Agent loop customization restricted to what AWS exposes Verdict: Good for simple cases; observability limitations are blockers in finance Option C: Amazon Bedrock AgentCore Pros Managed runtime with native persistent session memory (AgentCore Memory) Configurable guardrails as declarative policy, not inline code Native traceability via CloudTrail + X-Ray with tool-call spans AgentCore Gateway for tool-use with OAuth2/OIDC and per-tool throttling Configurable per-session token budget control Cons Platform lock-in to AWS for the agent runtime Execution graph customization more restricted than LangGraph New service: API surface still evolving, conservative quotas AgentCore Memory and Gateway costs added on top of inference cost Verdict: Recommended decision for regulated financial environments with a lean platform team Option D: Step Functions + Lambda as agent orchestrator Pros Native audit via Step Functions execution history Declarative and testable retry, timeout, and error handling No new service to learn — team already knows the pattern Cons Not an agent runtime: each 'turn' requires a new execution or .waitForTaskToken Session memory and model context must be managed externally Cold-start and state transition latency can be noticeable in dialogues Verdict: Excellent for deterministic workflows; inadequate as a conversational agent runtime The Decision and the Reasoning Behind It The Decision and the Reasoning Behind It The decision was to adopt Bedrock AgentCore as the primary agent runtime, with Step Functions as the orchestrator for adjacent deterministic workflows (approvals, reconciliations, notifications). This is not an all-or-nothing decision: AgentCore solves the non-deterministic agent loop problem, while Step Functions remains the right choice for the deterministic business process that wraps the agent. The decisive argument was the AgentCore Gateway with per-tool OAuth2/OIDC support. In a financial environment, every tool-call is an action with identity: who authorized it, what scope, with which token. Implementing this manually in LangChain would mean building and maintaining an authorization proxy — exactly the kind of infrastructure that generates no business value but generates security incidents when neglected. The Gateway delivers this as declarative configuration, with per-tool throttling (e.g., maximum 10 calls/session for the order execution API) and a native circuit breaker. The second argument was session memory with configurable TTL. AgentCore Memory persists conversation context in a managed store, with per-session configurable TTL and KMS customer-managed key (CMK) encryption. For LGPD/GDPR compliance, this means I can configure a 24h TTL for analyst sessions and guarantee that no session data persists beyond what's necessary — without building a custom expiration pipeline. The lock-in trade-off was consciously accepted: the tool-use layer (the Lambda functions that execute the actual actions) remains completely portable. If we need to migrate the runtime in the future, the tools keep working. Financial Agent Architecture with Bedrock AgentCore Execution flow of a financial analysis agent: from analyst to AgentCore runtime, through guardrails, tool-use via Gateway, session memory, and observability 🔐 AWS — Segurança & Entrada API Gateway REST + Cognito JWT (edge) Bedrock Guardrails PII filter + topic deny (security) 🤖 AWS — AgentCore Runtime AgentCore Runtime Claude 3.5 Sonnet (ai) AgentCore Memory TTL=24h, KMS CMK (storage) AgentCore Gateway OAuth2/OIDC, throttle (security) ⚙️ AWS — Ferramentas (Tool-use) Lambda: Market Data Bloomberg API proxy (compute) Lambda: Risk Calc VaR engine (compute) Knowledge Base OpenSearch + S3 (data) 📊 AWS — Observabilidade & Auditoria X-Ray span por tool-call (external) CloudTrail API audit log (storage) CloudWatch SLO dashboards (external) Flows analyst -> apigw: HTTPS + JWT apigw -> guardrails: input validation guardrails -> agentcore: sanitized prompt agentcore -> memory: read/write context agentcore -> gateway: tool invocation gateway -> lambda_market: OAuth2 token gateway -> lambda_risk: OAuth2 token gateway -> kb: RAG query agentcore -> guardrails: output filter agentcore -> xray: traces apigw -> cloudtrail: API events xray -> cw: SLO metrics Concrete Configuration: What Actually Matters Concrete Configuration: What Actually Matters Adopting AgentCore without properly configuring operational controls is worse than not adopting it — you gain a false sense of security without active guardrails. Here are the configurations that make a real difference: Guardrails as first line: Configure contentPolicyConfig with HATE, INSULTS, SEXUAL, VIOLENCE all set to BLOCK, and sensitiveInformationPolicyConfig with PII filters for CREDIT_DEBIT_CARD_NUMBER, AWS_ACCESS_KEY, NAME, and EMAIL in ANONYMIZE mode. In a financial environment, add topicPolicyConfig with explicitly denied topics: "investment advice without disclaimer", "guaranteed returns". This isn't paranoia — it's the minimum to pass a compliance review. AgentCore Memory with correct partitioning: The memory partition key must be userId + sessionId, never just sessionId. In multi-tenant environments, sessions from different users with the same sessionId collided in testing — a silent bug that leaks context between users. Configure memoryConfiguration.enabledMemoryTypes with SESSION_SUMMARY for long sessions, reducing context token consumption by up to 40% in sessions exceeding 20 turns. Gateway with per-tool throttling: Define separate rateLimit for each Action Group. The order execution API should have maxRequestsPerSession: 5 and requireConfirmation: ENABLED. The market data query API can have maxRequestsPerSession: 50. Without this granularity, a faulty agent loop can execute dozens of orders before being detected — a scenario I've seen happen in production with frameworks lacking tool-use controls. Per-session token budget: Configure sessionConfiguration.maxTokens with a conservative initial value — I recommend 50,000 tokens for typical analysis sessions. Monitor the p95 token consumption per session in CloudWatch and adjust. An agent entering a reasoning loop can consume 200k+ tokens in a single session without this control. Observability: What to Measure and How Observability: What to Measure and How AI agents have a different observability profile from traditional APIs. p99 latency is less useful than turns-per-session distribution and tool-call failure rate per tool. Here is the observability model I implemented: Agent business metrics (via CloudWatch custom metrics with namespace FinancialAgent): TurnsPerSession — histogram; alert if p95 > 15 turns (indicates loop or poorly calibrated prompt) TokensPerSession — histogram; alert if p95 > 40k tokens ToolCallFailureRate per ToolName — counter; SLO of < 1% failure for critical tools GuardrailInterventionRate — counter; spike indicates jailbreak attempt or prompt injection Traces with X-Ray: AgentCore emits spans for each tool invocation with attributes bedrock.agent.toolName, bedrock.agent.sessionId, and bedrock.agent.turnCount. Configure a trace group with filter annotation.bedrock.agent.toolName = "ExecuteOrder" and alert on latency > 2s — order execution above that indicates a downstream API issue. CloudTrail for regulatory audit: Each InvokeAgent API call is recorded with the caller ARN, sessionId, and inputText (truncated). For compliance, configure an S3 bucket with Object Lock in COMPLIANCE mode and 7-year retention for AgentCore CloudTrail logs. This is the minimum to meet Banco Central do Brasil and SEC audit requirements. Cost anomaly alarm: Configure an AWS Budget with an alert at 80% of the monthly Bedrock budget, with an SNS action. Add a second CloudWatch alarm on bedrock:InvokeModel with model-id=anthropic.claude-3-5-sonnet and a threshold of 1,000 invocations/hour — above that, something is wrong. Consequences and Risks You Need to Accept: Runtime lock-in is real: If AWS deprecates or significantly changes the AgentCore API, migration requires rewriting the orchestration logic — not just the tools. Mitigate by keeping tools (Lambda) completely runtime-agnostic and documenting the interface contract in a separate ADR. Conservative quotas on a new service: AgentCore has concurrent agent sessions quotas that, at launch, were significantly lower than traditional Bedrock Agents quotas. Request quota increases before go-live, not after. A peak event without adequate quota results in ThrottlingException that the end client sees as a timeout. Guardrails have latency: Each pass through Guardrails adds 100-300ms of latency. In an agent with 10 turns, that's up to 3 additional seconds of accumulated latency. For use cases where latency is critical, consider disabling output guardrails on internal tools (not exposed to the end user) and applying them only on the final output. Memory is not free: AgentCore Memory charges for storage and per read/write operation. In long sessions with SESSION_SUMMARY active, memory cost can exceed inference cost for short sessions. Monitor MemoryReadLatency and MemoryWriteLatency — above 200ms indicates pressure on the managed store. Human-in-the-loop is not automatic: requireConfirmation: ENABLED on the Gateway pauses execution and waits for confirmation via callback. If the client doesn't respond within confirmationTimeout (default: 300s), the session expires. Design the UX to make this clear to the user — timeout-expired sessions are the leading cause of complaints in financial agents. Real Reference Numbers ~40% — Token reduction with SESSION_SUMMARY. In sessions with more than 20 turns, SESSION_SUMMARY reduces context sent to the model by ~40% vs. full history 100-300ms — Latency added by Guardrails per turn. Each Guardrail evaluation (input + output) adds 100-300ms; across 10 turns, up to 3s accumulated 7 anos — Minimum CloudTrail retention for financial compliance. S3 Object Lock in COMPLIANCE mode with 7 years meets Banco Central do Brasil and SEC requirements for agent audit trails Well-Architected Assessment security: Declarative guardrails with PII filter and denied topics; AgentCore Gateway with per-tool OAuth2/OIDC; KMS CMK for session memory; CloudTrail with S3 Object Lock for immutable audit. IAM with bedrock:AgentArnLike condition to restrict which agents can invoke which tools. reliability: Automatic retry with jitter in the Bedrock SDK (max_attempts=3, mode=adaptive); native circuit breaker in AgentCore Gateway per tool; configurable session timeout prevents zombie sessions; concurrent session quotas must be requested before go-live. performance: SESSION_SUMMARY reduces context tokens by ~40% for long sessions; disabling output guardrails on internal tools reduces accumulated latency; Knowledge Base with OpenSearch k-NN with HNSW and ef_search=512 for low-latency RAG. cost: AWS Budget with alert at 80% of monthly limit; CloudWatch alarm on invocations/hour per model-id; SESSION_SUMMARY reduces inference cost in long sessions; monitor AgentCore Memory cost separately from inference cost. What the AWS Blog Doesn't Tell You What the AWS Blog Doesn't Tell You AWS service launch blogs are excellent at showing the happy path. What they rarely cover are the edge cases you only discover in production. Here are the three that cost me the most time: Tool-call idempotency is not guaranteed by the runtime. If AgentCore attempts to invoke a tool and receives a timeout, it may retry — and your Lambda may be invoked twice for the same action. For idempotent tools (queries), this is harmless. For tools with side effects (order execution, email sending), you need to implement idempotency in the Lambda using an idempotencyToken derived from sessionId + turnId + toolName. Without this, order duplication is a matter of when, not if. The model can ignore requireConfirmation in certain prompt formulations. I tested this: if the system prompt instructs the agent to "be proactive and execute actions without asking for unnecessary confirmation," the model may rationalize that a specific action doesn't need confirmation even with the flag active. The correct defense is dual: the flag on the Gateway and an explicit instruction in the system prompt about when confirmation is mandatory. Never rely on a single layer. AgentCore doesn't have native multi-agent support yet. If your architecture requires a supervisor agent delegating to specialized agents (multi-agent orchestration pattern), you'll need to implement the delegation logic manually — typically with an agent that calls other agents via tool-use, where each "tool" is actually an invocation of another AgentCore. It works, but cross-session traceability requires manual sessionId correlation via X-Ray. Anti-Patterns I've Seen in Architecture Reviews Using AgentCore without configuring Guardrails because "it's an internal environment" — insiders are the primary source of compliance incidents in finance Storing full session history in memory without SESSION_SUMMARY — token cost grows linearly with number of turns Implementing critical business logic inside the agent system prompt instead of in testable tools — prompts don't have unit tests Not requesting concurrent session quota increase before go-live — ThrottlingException during peak usage is predictable and preventable Assuming AgentCore Gateway replaces a business authorization layer — the Gateway controls access to the tool, not the authorization logic inside the tool Not implementing idempotency in tool Lambdas with side effects — runtime retries can duplicate irreversible actions Curator's Note: In practice, what convinced me to adopt AgentCore was not any individual feature — it was the fact that the AgentCore Gateway with per-tool OAuth2/OIDC solves the tool-call identity problem I was about to build manually, which would have taken two sprints and generated permanent technical debt. The hard-won lesson behind this: in financial environments, the cost of building custom security controls is not the initial development cost — it's the cost of maintaining, auditing, and fixing those controls over years. When a managed service delivers the control as declarative configuration, the decision to adopt it is rarely about feature parity; it's about where you want to allocate your team's engineering attention. My recommendation: adopt AgentCore for new production agents, keep tools portable, and invest the saved time in observability and adversarial prompting tests. Verdict: Adopt with Explicit Controls Bedrock AgentCore is the right choice for financial teams that need to put AI agents into production without building and maintaining a custom orchestration runtime. The decision is not binary — it's about recognizing that AgentCore's value lies in the operational controls (Gateway, Guardrails, Memory with CMK), not just the execution runtime. The condition for adoption is clear: configure Guardrails before any testing with real data, implement idempotency in all tools with side effects, request concurrent session quota increases before go-live, and monitor TurnsPerSession and TokensPerSession as first-class SLO metrics. Lock-in is real but manageable if tools are kept portable. For teams that lack the capacity to build and operate a custom agent runtime — which is most teams — AgentCore is the correct architectural decision in 2025. References Amazon Bedrock AgentCore — Developer Guide Amazon Bedrock Guardrails — Configuration Reference Amazon Bedrock AgentCore Memory — Developer Guide Building AI agents with Amazon Bedrock AgentCore — AWS ML Blog AWS Well-Architected Framework — Machine Learning Lens Idempotency for AWS Lambda — Powertools for AWS Lambda Architecture Decision Records — Michael Nygard Amazon OpenSearch Service — k-NN Search with HNSW Originally published at fernando.moretes.com. By Fernando F. Azevedo — Senior Solutions Architect.

18 hours ago

Congress passes sweeping bipartisan housing affordability bill, sending it to the president

The Senate passes a sweeping, bipartisan housing affordability bill, sending the legislation to the House after near-una...

10 sources 1 day ago

Politics

U.S. Senate and House advance housing bill including Fed CBDC ban through 2030

The U.S. Senate passes the bipartisan “21st Century ROAD to Housing Act” with a large vote margin of 85–5, and the measu...

6 sources 1 week ago

Politics

Nigerian airlines ask President Tinubu to intervene on aviation taxes and rising costs

Nigerian airline operators are seeking an urgent meeting with President Bola Tinubu to discuss how aviation taxes, regul...

2 sources 17 hours ago