
Every cloud provider has service quotas and limits. Learn the differences between AWS Service Quotas, Azure subscription limits, and GCP quotas—including how to monitor, request increases, and automate quota management across all three platforms.

You’ve architected a scalable system, tested in staging, and you’re ready to launch. Then deployment fails: “vCPU limit exceeded.” Or worse—it succeeds in one region but fails in another because quota limits vary by region.
Every cloud provider imposes limits on resources you can provision. Understanding these limits—and how to manage them proactively—is essential for production workloads.
This guide covers:
Before diving in, let’s clear up the vocabulary:
| Provider | Terms Used | Meaning |
|---|---|---|
| AWS | Quota = Limit (interchangeable) | Both refer to the same concept; some are adjustable, some are fixed |
| Azure | Limits | Can refer to adjustable quotas or hard caps; some are tier-dependent |
| GCP | Quota (adjustable) vs Limit (fixed) | Quotas can be increased; limits cannot |
This inconsistency causes confusion when working across clouds. Throughout this guide, we’ll use “quota” for adjustable values and “limit” for hard caps.
AWS has the most mature quota management system, with a dedicated service called Service Quotas that provides a unified view across all AWS services.
| Concept | Description |
|---|---|
| Account-level quotas | Apply to your entire AWS account |
| Resource-level quotas | Apply to specific resources (newer feature) |
| Regional quotas | Many quotas are per-region, not global |
| Default quotas | What you start with; usually lower than max |
| Applied quotas | Your current limit after any increases |
AWS Console:
AWS CLI:
# List quotas for a specific service
aws service-quotas list-service-quotas \
--service-code ec2 \
--query 'Quotas[*].[QuotaName,Value,Adjustable]' \
--output table
# Get a specific quota
aws service-quotas get-service-quota \
--service-code ec2 \
--quota-code L-1216C47A # Running On-Demand Standard instances
| Service | Quota | Default | Notes |
|---|---|---|---|
| EC2 | Running On-Demand instances (vCPUs) | 5-64 vCPUs (varies by instance type) | Per region, per instance family |
| Lambda | Concurrent executions | 1,000 | Per region; soft limit |
| VPC | VPCs per region | 5 | Usually easy to increase |
| EBS | Snapshots per region | 100,000 | Higher than you’d expect |
| S3 | Buckets per account | 100 | Global, not regional |
| RDS | DB instances per region | 40 | Includes all engine types |
| IAM | Roles per account | 1,000 | Global quota |
| CloudFormation | Stacks per region | 2,000 | Can be limiting in complex setups |
Via Console:
Via CLI:
aws service-quotas request-service-quota-increase \
--service-code ec2 \
--quota-code L-1216C47A \
--desired-value 256
Processing Time:
AWS recently launched Automatic Quota Management that proactively adjusts quotas based on your usage:
# Enable automatic management for a quota
aws service-quotas put-service-quota-increase-request-into-template \
--service-code lambda \
--quota-code L-B99A9384 \
--desired-value 5000 \
--aws-region us-east-1
Two modes:
Set up alarms before you hit limits:
# Create alarm for Lambda concurrent executions
aws cloudwatch put-metric-alarm \
--alarm-name "LambdaConcurrentExecutionsHigh" \
--metric-name ConcurrentExecutions \
--namespace AWS/Lambda \
--statistic Maximum \
--period 60 \
--threshold 800 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 3 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:alerts
For enterprise setups, AWS provides the Quota Monitor for AWS solution:
Deploy from the AWS Solutions Library.
Azure takes a different approach: limits are tied to subscriptions and often vary by service tier and region.
| Concept | Description |
|---|---|
| Subscription limits | Most limits apply at subscription scope |
| Regional limits | Many limits are per-region within a subscription |
| Tier-dependent limits | Some limits increase with higher service tiers |
| vCPU quotas | Compute limits are per VM family, per region |
Azure Portal:
Azure CLI:
# List compute quotas for a region
az vm list-usage --location eastus --output table
# Get specific quota
az quota show \
--scope "/subscriptions/{sub-id}/providers/Microsoft.Compute/locations/eastus" \
--resource-name "standardDSv3Family"
| Service | Limit | Default | Notes |
|---|---|---|---|
| Compute | Total Regional vCPUs | 20-100 | Per subscription, per region |
| Compute | VM family vCPUs | 10-20 | Per family (DSv3, FSv2, etc.) |
| Storage | Storage accounts per region | 250 | Per subscription |
| Networking | VNets per subscription | 1,000 | Global across regions |
| App Service | App Service plans | 100 | Per resource group |
| Azure Functions | Max instances (Consumption) | 200 | Per function app |
| AKS | Clusters per subscription | 5,000 | High default |
| Resource Manager | API reads per hour | 12,000 | Can throttle automation |
Via Portal:
Via Azure CLI:
# Request quota increase
az quota create \
--scope "/subscriptions/{sub-id}/providers/Microsoft.Compute/locations/eastus" \
--resource-name "standardDSv3Family" \
--limit-object value=100 limit-object-type=LimitValue
Via Support Request: For large increases or non-adjustable limits:
Free and Trial subscriptions cannot request quota increases—you must upgrade to Pay-as-you-go or higher.
Regional differences: If you need 30 vCPUs in West Europe, you must specifically request 30 vCPUs in West Europe. Increases don’t apply globally.
No cost for quotas: Requesting a quota increase is free—you only pay for resources you actually provision.
Azure Monitor Alerts:
# Create alert rule for quota usage
az monitor metrics alert create \
--name "HighCPUQuotaUsage" \
--resource-group myResourceGroup \
--scopes "/subscriptions/{sub-id}" \
--condition "total UsagePercentage > 80" \
--action-group myActionGroup \
--description "Alert when CPU quota usage exceeds 80%"
Azure Policy for Quota Governance:
{
"if": {
"allOf": [
{
"field": "type",
"equals": "Microsoft.Compute/virtualMachines"
},
{
"field": "Microsoft.Compute/virtualMachines/sku.name",
"like": "Standard_D*"
}
]
},
"then": {
"effect": "audit"
}
}
GCP distinguishes clearly between quotas (adjustable) and limits (fixed), and applies them at the project level rather than account level.
| Concept | Description |
|---|---|
| Project-level quotas | Most quotas apply per GCP project |
| Regional quotas | Many quotas are per-region within a project |
| Quotas vs Limits | Quotas can be increased; limits are hard caps |
| Quota metrics | Quotas are tracked as metrics you can monitor |
GCP Console:
gcloud CLI:
# List quotas for Compute Engine
gcloud compute project-info describe \
--project=my-project \
--format="table(quotas.metric,quotas.limit,quotas.usage)"
# List quotas for a specific region
gcloud compute regions describe us-central1 \
--format="table(quotas.metric,quotas.limit,quotas.usage)"
| Service | Quota | Default | Notes |
|---|---|---|---|
| Compute Engine | CPUs per region | 24 | Per project, per region |
| Compute Engine | GPUs per region | 0 | Must request to get any |
| Compute Engine | Persistent disk (TB) | 20 TB | Per region |
| GKE | Nodes per cluster | 15,000 | High default |
| Cloud Functions | Max instances | 3,000 | Per function, per region |
| Cloud SQL | Instances per project | 100 | Across all regions |
| BigQuery | Concurrent queries | 100 | Per project |
| Pub/Sub | Topics per project | 10,000 | High default |
| Cloud Storage | Buckets per project | Unlimited | No quota, but other limits apply |
Via Console:
Via gcloud:
# Request quota increase (newer method)
gcloud alpha quotas quota-preferences create \
--service=compute.googleapis.com \
--quota-id=CPUS-per-project-region \
--preferred-value=100 \
--project=my-project \
--dimensions=region=us-central1
Processing:
GCP uniquely offers Terraform resources for quota management:
# Request a quota increase via Terraform
resource "google_cloud_quotas_quota_preference" "compute_cpus" {
parent = "projects/my-project"
service = "compute.googleapis.com"
quota_id = "CPUS-per-project-region"
contact_email = "[email protected]"
quota_config {
preferred_value = 100
}
dimensions = {
region = "us-central1"
}
}
# Data source to check quota info
data "google_cloud_quotas_quota_info" "compute_cpus" {
parent = "projects/my-project"
service = "compute.googleapis.com"
quota_id = "CPUS-per-project-region"
}
output "quota_increase_eligible" {
value = data.google_cloud_quotas_quota_info.compute_cpus.quota_increase_eligibility
}
GCP offers an automatic quota adjustment feature:
Two modes:
Cloud Monitoring Alerts:
# Alert policy for quota usage
displayName: "High CPU Quota Usage"
combiner: OR
conditions:
- displayName: "CPU quota > 80%"
conditionThreshold:
filter: 'metric.type="compute.googleapis.com/quota/cpus_per_project/usage" resource.type="compute.googleapis.com/Project"'
comparison: COMPARISON_GT
thresholdValue: 0.8
duration: "60s"
aggregations:
- alignmentPeriod: "60s"
perSeriesAligner: ALIGN_MEAN
| Aspect | AWS | Azure | GCP |
|---|---|---|---|
| Primary Scope | Account + Region | Subscription + Region | Project + Region |
| Global Quotas | Some (S3 buckets, IAM) | Rare | Rare |
| Quota Console | Service Quotas | Usage + quotas blade | IAM & Admin → Quotas |
| API/CLI Support | Comprehensive | Good | Good |
| Terraform Support | Via AWS provider | Via AzureRM provider | Native quota resources |
| Automatic Management | Yes (new in 2025) | Limited | Yes (Quota Adjuster) |
| Aspect | AWS | Azure | GCP |
|---|---|---|---|
| Self-service Increases | Most quotas | Most quotas | Most quotas |
| Approval Time (Small) | Minutes | Minutes to hours | Minutes |
| Approval Time (Large) | Hours to days | Hours to days | Hours to days |
| Support Required | For very large increases | For large/fixed limits | For large increases |
| Cost to Request | Free | Free | Free |
| Feature | AWS | Azure | GCP |
|---|---|---|---|
| Native Monitoring | CloudWatch + Service Quotas | Azure Monitor | Cloud Monitoring |
| Pre-built Solution | Quota Monitor for AWS | Azure Advisor | Quota Adjuster |
| Multi-account View | With AWS Organizations | With Management Groups | With Resource Manager |
| Alerting | SNS, EventBridge | Action Groups | Notification Channels |
For organizations running across multiple clouds, consider these strategies:
Maintain a single source of truth for quota requirements:
# quotas.yml
production:
aws:
us-east-1:
ec2_vcpus: 256
lambda_concurrent: 5000
rds_instances: 20
eu-west-1:
ec2_vcpus: 128
lambda_concurrent: 3000
azure:
eastus:
compute_vcpus: 200
storage_accounts: 50
westeurope:
compute_vcpus: 100
gcp:
us-central1:
compute_cpus: 100
cloud_sql_instances: 20
Before deploying to a new region, verify quotas are sufficient:
#!/bin/bash
# check-quotas.sh
REGION=$1
REQUIRED_VCPUS=${2:-50}
# Check AWS
aws_vcpus=$(aws service-quotas get-service-quota \
--service-code ec2 \
--quota-code L-1216C47A \
--region $REGION \
--query 'Quota.Value' --output text)
if (( $(echo "$aws_vcpus < $REQUIRED_VCPUS" | bc -l) )); then
echo "AWS: Insufficient vCPUs ($aws_vcpus < $REQUIRED_VCPUS)"
exit 1
fi
echo "All quota checks passed"
Deploy monitoring across all clouds:
AWS (CloudWatch):
aws cloudwatch put-metric-alarm \
--alarm-name "EC2-vCPU-Quota-Warning" \
--metric-name ResourceCount \
--namespace AWS/Usage \
--dimensions Name=Service,Value=EC2 Name=Resource,Value=vCPU \
--statistic Maximum \
--period 3600 \
--threshold 80 \
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:quota-alerts
GCP (Terraform):
resource "google_monitoring_alert_policy" "quota_alert" {
display_name = "CPU Quota Warning"
combiner = "OR"
conditions {
display_name = "CPU usage > 80%"
condition_threshold {
filter = "metric.type=\"compute.googleapis.com/quota/cpus_per_project/usage\" resource.type=\"compute.googleapis.com/Project\""
comparison = "COMPARISON_GT"
threshold_value = 0.8
duration = "60s"
}
}
notification_channels = [google_monitoring_notification_channel.email.name]
}
For frequently needed increases, automate the request process:
# quota_manager.py
import boto3
import subprocess
import json
def request_aws_quota_increase(service_code, quota_code, region, desired_value):
client = boto3.client('service-quotas', region_name=region)
try:
response = client.request_service_quota_increase(
ServiceCode=service_code,
QuotaCode=quota_code,
DesiredValue=desired_value
)
return response['RequestedQuota']['Status']
except Exception as e:
return f"Error: {e}"
def request_gcp_quota_increase(project, quota_id, region, desired_value):
cmd = [
'gcloud', 'alpha', 'quotas', 'quota-preferences', 'create',
f'--service=compute.googleapis.com',
f'--quota-id={quota_id}',
f'--preferred-value={desired_value}',
f'--project={project}',
f'--dimensions=region={region}',
'--format=json'
]
result = subprocess.run(cmd, capture_output=True, text=True)
return json.loads(result.stdout) if result.returncode == 0 else result.stderr
Cloud quotas are a reality of operating in any public cloud. The providers differ in terminology and tools, but the core workflow is the same:
The most mature teams treat quota management as part of their infrastructure-as-code practice—tracking requirements in version control, automating checks in CI/CD, and requesting increases as part of their deployment process.
Related Reading:
References: