Fix Single-AZ VPC Subnets in AWS | Lensix

TL;DR

This check flags VPCs where all subnets live in a single availability zone, leaving every workload exposed to a single-AZ outage. Spread your subnets across at least two or three AZs and route traffic through multi-AZ resources to fix it.

An availability zone going down is not a hypothetical. AWS AZs fail, and when they do, anything pinned to that one zone goes with it. The VPC Has Single-AZ Subnets check looks at the layout of your VPC and warns you when every subnet sits inside the same availability zone. It is one of those quiet misconfigurations that causes no problems at all until the day it causes a total outage.

In this post we will walk through what the check detects, why a single-AZ VPC is a real reliability risk, how to redistribute your subnets across multiple zones, and how to keep new VPCs from drifting back into the same trap.

What this check detects

Every subnet in AWS belongs to exactly one availability zone, and that binding is permanent. You choose the AZ when you create the subnet, and you cannot move it later. A VPC, on the other hand, spans an entire region and can hold subnets in any AZ that region offers.

The vpc_singlesubnet check inspects each VPC and collects the distinct availability zones used by its subnets. If that set contains only one zone, the VPC is flagged. In practice this usually happens in one of a few ways:

A VPC created quickly for testing that quietly grew into something production-facing
Terraform or CloudFormation that hardcodes a single AZ instead of iterating over a list
A developer accepting the default subnet suggestions in the console and never adding more
Subnets deleted over time until only one zone remains in use

Note: The check looks at where your subnets are, not where your instances are. A VPC can have subnets in three AZs but still run everything in one. That is a separate problem. This check is the foundation: you cannot run multi-AZ workloads if your network only reaches one zone.

Why it matters

Availability zones are AWS's primary unit of fault isolation. Each AZ is one or more discrete data centers with independent power, cooling, and networking. The entire point of having multiple AZs in a region is so that a failure in one does not take down the others.

When all your subnets live in a single AZ, you throw that isolation away. A single zone outage means:

Every EC2 instance, RDS node, and ENI in that VPC goes offline at the same time, with nowhere to fail over to.
Auto Scaling cannot recover because there is no second subnet for it to launch replacement instances into.
Load balancers lose all targets since every target lives in the failed zone.
RDS Multi-AZ is impossible because the standby replica needs a subnet in a different AZ to live in.

AWS has had real AZ-level disruptions, and customers running single-AZ deployments felt the full impact while multi-AZ customers stayed up. There is also a quieter business cost: many compliance frameworks and customer SLAs assume your architecture can survive a zone failure. A single-AZ VPC silently breaks that assumption.

Warning: Cross-AZ data transfer is billed at roughly $0.01 per GB in each direction. Spreading workloads across zones is the right call for reliability, but watch chatty services that move large volumes between AZs, since the cost can add up.

How to fix it

The fix is to add subnets in additional availability zones and then move your workloads to use them. You cannot change the AZ of an existing subnet, so this is about creating new ones, not editing old ones.

Step 1: Check which AZs the region offers

aws ec2 describe-availability-zones \
  --region us-east-1 \
  --query "AvailabilityZones[?State=='available'].ZoneName" \
  --output text

Step 2: Inspect the current subnet layout of the VPC

aws ec2 describe-subnets \
  --filters "Name=vpc-id,Values=vpc-0abc123456789def0" \
  --query "Subnets[].{Subnet:SubnetId,AZ:AvailabilityZone,CIDR:CidrBlock}" \
  --output table

If the output shows every row landing in the same AZ, you have confirmed the finding.

Step 3: Create subnets in additional AZs

Pick CIDR ranges that do not overlap with your existing subnets, then create one subnet per additional zone.

# Second AZ
aws ec2 create-subnet \
  --vpc-id vpc-0abc123456789def0 \
  --cidr-block 10.0.2.0/24 \
  --availability-zone us-east-1b

# Third AZ
aws ec2 create-subnet \
  --vpc-id vpc-0abc123456789def0 \
  --cidr-block 10.0.3.0/24 \
  --availability-zone us-east-1c

Step 4: Associate the new subnets with a route table

New subnets are associated with the VPC's main route table by default. If you use custom route tables (for example, separate public and private routing), associate explicitly.

aws ec2 associate-route-table \
  --route-table-id rtb-0123456789abcdef0 \
  --subnet-id subnet-0newsubnet1111

Step 5: Point your workloads at the new subnets

Creating subnets alone does not make you resilient. Update the resources that actually run traffic:

Auto Scaling groups: add the new subnet IDs to the group so instances launch across zones.
Load balancers: enable the new subnets in the listener configuration.
RDS: create a DB subnet group spanning multiple AZs, then enable Multi-AZ.

aws autoscaling update-auto-scaling-group \
  --auto-scaling-group-name web-asg \
  --vpc-zone-identifier "subnet-existing0000,subnet-0newsubnet1111,subnet-0newsubnet2222"

Danger: Do not delete the original single-AZ subnet to "clean up" before you have migrated everything off it. Deleting a subnet with active ENIs, instances, or RDS nodes will fail, and forcing the issue by terminating those resources first will cause an outage. Migrate first, then decommission.

The Terraform way

If you manage infrastructure as code, the cleaner fix is to make subnet creation iterate over a list of AZs so the problem cannot recur.

data "aws_availability_zones" "available" {
  state = "available"
}

resource "aws_subnet" "private" {
  for_each = {
    for idx, az in slice(data.aws_availability_zones.available.names, 0, 3) :
    az => cidrsubnet(var.vpc_cidr, 8, idx)
  }

  vpc_id            = aws_vpc.main.id
  availability_zone = each.key
  cidr_block        = each.value

  tags = {
    Name = "private-${each.key}"
  }
}

Tip: The AWS VPC Terraform module (terraform-aws-modules/vpc/aws) handles multi-AZ layout for you. Pass three AZs into azs and a matching list into private_subnets and public_subnets, and you get a correctly spread VPC without hand-managing CIDRs.

How to prevent it from happening again

Fixing one VPC is easy. Making sure no new single-AZ VPC ever ships is the part that actually saves you. A few layers work well together.

Policy as code in CI

Catch the misconfiguration before it merges. With Checkov or OPA/Conftest you can write a rule that fails any plan where subnets do not span at least two AZs. Here is the idea expressed as a Conftest policy against a Terraform plan:

package main

deny[msg] {
  azs := { s.values.availability_zone |
    s := input.resource_changes[_]
    s.type == "aws_subnet"
  }
  count(azs) < 2
  msg := sprintf("VPC subnets span only %d AZ(s); require at least 2", [count(azs)])
}

Wire that into your pipeline so a single-AZ plan blocks the pull request rather than landing in production.

Guardrails at the org level

Use AWS Config with a custom rule, or a Service Control Policy combined with Config, to continuously evaluate live VPCs and flag any that use only one AZ. This catches drift and manually created resources that never went through your pipeline.

Tip: Lensix runs the vpc_singlesubnet check continuously across every connected account, so a single-AZ VPC created by hand in some forgotten dev account still surfaces in your findings instead of waiting to be discovered during an outage.

Standardize on a vetted network module

The most durable prevention is to stop letting teams build VPCs from scratch. Publish one internal Terraform module that always provisions three AZs, and require its use. When the correct pattern is the default and the easy path, single-AZ VPCs stop appearing.

Best practices

Use three AZs where the region supports it. Two AZs survive one failure, but three give you headroom for quorum-based systems like etcd, ZooKeeper, and many databases that need an odd number of nodes.
Match public and private subnets per AZ. For each zone, create both a public and a private subnet so NAT gateways, load balancers, and workloads all have a home in every zone.
Run a NAT gateway per AZ for production. A single NAT gateway is itself a single-AZ dependency. If its zone fails, private subnets in other zones lose outbound internet.
Leave CIDR room for growth. Plan subnet sizing so you can add zones or expand without renumbering. Use a /16 VPC and /24 or larger subnets.
Confirm workloads actually use the spread. Multi-AZ subnets are necessary but not sufficient. Verify that Auto Scaling groups, load balancers, and databases are configured to use every zone, not just sitting on top of subnets they ignore.
Test failure, do not assume it. Periodically simulate a zone going dark and watch whether traffic actually shifts. Resilience you have never exercised is resilience you do not have.

Single-AZ subnets are cheap to fix today and expensive to discover during an incident. Spread your network across zones, gate the pattern in CI, and standardize on a module that gets it right by default, and this check stays green for good.

VPC Has Single-AZ Subnets: Why It Breaks Resilience and How to Fix It