Skip to Main Content

AWS

AWS AgentCore Quota Increases for AI Scaling

Amazon Web Services has substantially increased Amazon Bedrock AgentCore runtime quotas, enabling enterprises to scale AI agents and user interactions efficiently.

Read time
3 min read
Word count
633 words
Date
Jul 2, 2026
Summarize with AI

Amazon Web Services has significantly increased Amazon Bedrock AgentCore runtime quotas by up to fivefold. This enhancement allows enterprises to support a greater number of concurrent AI agents and user interactions. The update aims to remove previous bottlenecks associated with the quota-increase process, which often hindered production deployments. While quota requests were free, this added capacity will likely lead to higher compute and runtime consumption as AI deployments expand within organizations, supporting a smoother transition from experimental phases to large-scale operational use.

AWS AgentCore Quota Increases for AI Scaling. Image generated with AI (Stable Diffusion XL)
Image generated with AI (Stable Diffusion XL)
🌟 Non-members read here

Amazon Web Services has significantly increased Amazon Bedrock AgentCore runtime quotas by up to fivefold. This enhancement allows enterprises to support a greater number of concurrent AI agents and user interactions, streamlining the path to production deployments. While the quota increase process itself was free, this expanded capacity will likely result in higher compute and runtime consumption as AI initiatives grow.

The updated default limits now support up to 5,000 active concurrent sessions in US East (N. Virginia) and US West (Oregon), a substantial rise from the previous 1,000. Other supported regions now accommodate 2,500 sessions, up from 500. AWS also boosted the number of interactions each AI agent can manage from 25 tokens per second to 200 tokens per second across all regions. This change enables organizations to handle more simultaneous user requests effectively. Furthermore, to facilitate faster scaling of AI applications during peak demand, the rate for creating new AI agent sessions for container deployments quadrupled, moving from 100 TPM to 400 TPM.

Enterprise Impact of Elevated Quotas

The adjustment in AgentCore runtime quotas responds directly to the growing trend of enterprises moving AI agent experiments into full-scale production. Charlie Dai, a principal analyst at Forrester, notes that the significant shift involves organizations transitioning from single-task copilots to multiple production-grade agents serving larger user populations. This indicates AWS is observing increased concurrency, longer-running agents, and more intricate orchestration patterns that surpass prior default assumptions.

For enterprises navigating this transition, the higher default quotas will reduce operational friction when scaling AI agents from pilot projects to live production environments. Large-scale AI deployments, particularly those involving multi-agent systems, frequently outgrow default runtime quotas, necessitating requests for increases. Amit Chandak, chief analytics officer at IT Consulting firm Kanerika, explains that such requests in an enterprise setting involve support tickets, business justifications, and review cycles. This administrative overhead can introduce delays of days or even weeks, unnecessarily impeding deployment.

Beyond the process cost, these new quotas influence architectural decisions. Teams often design systems around existing default ceilings. Higher default limits change what teams are willing to attempt without triggering an exceptions process, fundamentally shaping architectural choices. The benefits extend beyond administrative simplicity; exhausting runtime quotas in production can severely disrupt customer-facing applications and multi-agent workflows. Agent sessions are stateful, meaning a throttled session mid-task can lead to the loss of intermediate context, making state reconstruction much harder than simply retrying a stateless API call. In multi-agent pipelines, a single rejected session can halt an entire workflow, resulting in orphaned sessions, incomplete tool calls, and monitoring gaps that are difficult to diagnose retroactively.

These advantages will not be uniform across all enterprises. Gaurav Dewan, research director at Avasant, states that organizations with high-concurrency, transaction-intensive AI workloads stand to gain the most from these increased default quotas. This includes sectors such as customer service and contact centers, where AI agents often operate simultaneously at scale. Other beneficiaries include software engineering and DevOps automation, IT operations, financial services process automation, healthcare administration, supply chain coordination, and security operations.

Diverse Approaches to Production AI Scaling

AWS is not alone in adapting its infrastructure to assist enterprises in scaling AI agents for production. Rival hyperscalers, including Microsoft and Google, are addressing this challenge through distinct methodologies. Microsoft’s Azure Foundry Agent Service, for example, adopts a different approach. Chandak notes that many of its agent runtime limits are fixed by design and cannot be increased even upon request.

Microsoft places its scaling flexibility at the model deployment layer, where quotas are adjustable. This contrasts with AWS’s strategy for AgentCore, which focuses on raising the floor for concurrent sessions at the runtime level. This represents a deliberate architectural divergence between the two hyperscalers. The updated quota limits for Bedrock AgentCore will automatically apply to all enterprise accounts, ensuring immediate benefits for users.