Fix an Empty AWS Auto Scaling Group | Lensix

TL;DR

An Auto Scaling group with zero running instances means no compute is serving traffic behind it, which usually points to a stuck scale-down, a broken launch template, or a forgotten resource. Check why it dropped to zero, fix the desired capacity or launch config, and add alarms so an empty ASG never goes unnoticed.

An Auto Scaling group (ASG) is supposed to keep a defined number of EC2 instances running. When Lensix flags asg_empty, it means one of your groups currently has no instances at all. That is not always a problem, but it is almost always worth a look. An empty ASG sitting behind a load balancer is a small outage waiting to be noticed by your customers before your monitoring.

This post walks through what the check looks at, why an empty group is risky, how to bring it back to a healthy state, and how to stop it from quietly happening again.

What this check detects

The asg_empty check inspects each Auto Scaling group in your AWS account and reports any group where the number of running instances is zero. In practice that means the group's current instance count does not match what you would expect from a service that should be serving requests.

There are a few distinct situations that produce an empty group:

Desired capacity is set to 0. Someone scaled the group down manually or a scheduled action did it.
Instances keep failing to launch. The launch template or launch configuration references a deleted AMI, an invalid instance type, or a subnet with no capacity.
Instances launch and immediately get terminated. Health checks fail, so the ASG kills and replaces instances in a loop, occasionally landing at zero.
A leftover group. The workload was decommissioned, but the ASG was never deleted.

Note: An ASG with a desired capacity, minimum, and maximum all set to 0 is sometimes intentional, for example a group that is only scaled up on a schedule. Lensix flags it so you can confirm the intent rather than assuming the worst.

Why it matters

The impact depends entirely on what the group is supposed to do. The common cases break down like this.

Availability risk

If the ASG sits behind an Application or Network Load Balancer and is meant to serve production traffic, an empty group means the load balancer has no healthy targets. Requests return 503s, health checks fail upstream, and depending on your routing, an entire service can go dark. This is the worst version of the problem because the failure is invisible until something downstream complains.

Hidden launch failures

A group that cannot launch instances is failing silently. AWS keeps trying, logs the failures in the activity history, and otherwise stays quiet. Without an alarm, you only find out when the ASG should have scaled up under load and could not.

Warning: A misconfigured launch template that points to a deleted AMI will keep the group empty indefinitely. During a traffic spike, this means your "auto scaling" provides zero protection exactly when you need it most.

Wasted resources and clutter

Decommissioned ASGs that were never removed add noise to your account. They show up in audits, complicate Infrastructure as Code drift detection, and make it harder to reason about what is actually running. They also leave behind launch templates, security groups, and target groups that nobody owns.

How to investigate and fix it

Start by figuring out why the group is empty before you change anything. The fix for a forgotten group is the opposite of the fix for a broken launch template.

Step 1: Look at the group's current configuration

aws autoscaling describe-auto-scaling-groups \
  --auto-scaling-group-names my-app-asg \
  --query 'AutoScalingGroups[0].{Min:MinSize,Max:MaxSize,Desired:DesiredCapacity,Instances:length(Instances)}'

If Desired and Min are both 0, someone scaled it down. If they are above 0 but Instances is 0, the group is failing to launch.

Step 2: Read the activity history

The scaling activity log tells you exactly what AWS tried and why it failed.

aws autoscaling describe-scaling-activities \
  --auto-scaling-group-name my-app-asg \
  --max-records 10 \
  --query 'Activities[].{Status:StatusCode,Cause:Cause,Description:Description}'

Look for messages like The image id 'ami-xxxx' does not exist or Instance failed to complete user's Lifecycle Action. These point straight at the root cause.

Step 3a: If the desired capacity was set to 0

If the group should be running and was simply scaled down, restore the capacity.

Danger: Changing desired capacity on a production ASG immediately launches instances and starts billing. Confirm the launch template is valid and that you are operating on the right group before running this.

aws autoscaling update-auto-scaling-group \
  --auto-scaling-group-name my-app-asg \
  --min-size 2 \
  --desired-capacity 2 \
  --max-size 6

Step 3b: If instances fail to launch

Fix the underlying launch template. The most common cause is a stale AMI ID. Create a new version of the launch template pointing at a valid AMI, then update the group to use it.

# Create a new launch template version with a valid AMI
aws ec2 create-launch-template-version \
  --launch-template-name my-app-lt \
  --source-version 1 \
  --launch-template-data '{"ImageId":"ami-0abc123valid"}'

# Point the ASG at the latest version
aws autoscaling update-auto-scaling-group \
  --auto-scaling-group-name my-app-asg \
  --launch-template "LaunchTemplateName=my-app-lt,Version=\$Latest"

Then trigger a refresh so the group attempts to launch with the corrected configuration.

aws autoscaling start-instance-refresh \
  --auto-scaling-group-name my-app-asg

Step 3c: If instances launch then immediately die

This is a health check problem. Check whether the ASG is using ELB health checks and whether your target group's health check path is correct.

aws elbv2 describe-target-health \
  --target-group-arn arn:aws:elasticloadbalancing:...:targetgroup/my-app/abc123

Common culprits are a health check path that returns a non-200 status, a security group that blocks the load balancer from reaching the instance port, or a health check grace period that is too short for your app to boot.

Tip: If your application takes a while to start, raise the health check grace period so instances are not killed before they finish booting: --health-check-grace-period 300 on update-auto-scaling-group.

Step 3d: If the group is genuinely unused

Confirm nothing depends on it, then delete it. Detach it from any load balancer target groups first if needed.

Danger: Deleting an Auto Scaling group is permanent. Double-check the group name and confirm no service routes traffic to it before running this command.

aws autoscaling delete-auto-scaling-group \
  --auto-scaling-group-name my-old-asg \
  --force-delete

How to prevent it from happening again

Empty groups almost always trace back to either an unmonitored failure or untracked manual changes. Both have systemic fixes.

Alarm on group capacity

Set a CloudWatch alarm on the GroupInServiceInstances metric so you find out the moment a production group drops below its expected count.

aws cloudwatch put-metric-alarm \
  --alarm-name my-app-asg-empty \
  --namespace AWS/AutoScaling \
  --metric-name GroupInServiceInstances \
  --dimensions Name=AutoScalingGroupName,Value=my-app-asg \
  --statistic Minimum \
  --period 300 \
  --evaluation-periods 1 \
  --threshold 1 \
  --comparison-operator LessThanThreshold \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:ops-alerts

Manage ASGs as code

Define every group in Terraform or CloudFormation so capacity changes go through review and drift is detectable. A minimal Terraform example:

resource "aws_autoscaling_group" "app" {
  name                = "my-app-asg"
  min_size            = 2
  max_size            = 6
  desired_capacity    = 2
  vpc_zone_identifier = var.private_subnets
  health_check_type   = "ELB"
  health_check_grace_period = 300

  launch_template {
    id      = aws_launch_template.app.id
    version = "$Latest"
  }

  target_group_arns = [aws_lb_target_group.app.arn]
}

When the launch template references an AMI through a data source that resolves the latest valid image, you remove the most common cause of stale AMI failures entirely.

Gate it in CI/CD with policy as code

Catch a min_size = 0 on a production group before it merges. With Open Policy Agent and Conftest you can write a rule against the Terraform plan:

package main

deny[msg] {
  rc := input.resource_changes[_]
  rc.type == "aws_autoscaling_group"
  rc.change.after.tags.Environment == "production"
  rc.change.after.min_size == 0
  msg := sprintf("Production ASG %q must have min_size > 0", [rc.address])
}

Tip: Pair the policy check with Lensix scheduled scans. Policy as code stops bad config at merge time, and Lensix catches anything that slips in through the console or an out-of-band change.

Best practices

Set a non-zero minimum for anything serving traffic. A minimum of at least 2 across two Availability Zones keeps you online through an instance failure.
Use ELB health checks, not just EC2 checks. EC2 health checks only confirm the instance is running, not that your application responds.
Resolve AMIs dynamically. Hardcoded AMI IDs go stale and silently break launches when the image is deprecated or deleted.
Tag groups with an owner and environment. This makes empty-group triage fast and tells you instantly whether a zero-capacity group is intentional.
Document scheduled scale-to-zero patterns. If a dev environment is meant to drop to zero overnight, tag it so the check and your on-call engineers know it is expected.
Clean up decommissioned groups. Delete the ASG, launch template, and orphaned target groups together so nothing lingers.

An empty Auto Scaling group is rarely the actual problem. It is a symptom, of a broken launch path, a missed scale-down, or a forgotten resource. Treat the asg_empty finding as a prompt to confirm intent, and you turn a quiet failure mode into a quick, routine check.

Auto Scaling Group Is Empty: Why It Happens and How to Fix It