Fix ASG Inactive Target Group References on AWS

TL;DR

This check flags Auto Scaling groups that point at a target group which no longer exists. New instances launch but never register behind your load balancer, so they receive zero traffic. Remove the stale target group ARN from the ASG or recreate the target group and reattach it.

An Auto Scaling group is only useful if the instances it launches actually end up serving traffic. When an ASG references a target group that has been deleted, the wiring between your scaling layer and your load balancer is broken. The instances boot, pass their EC2 health checks, and sit there doing nothing, because there is no longer a valid place to register them.

This is one of those misconfigurations that hides quietly until the moment you need capacity the most. The check asg_inactivelb catches it before a scaling event turns into an outage.

What this check detects

The ASG References Inactive Load Balancer check inspects each Auto Scaling group and compares the target group ARNs it is configured to use against the target groups that currently exist in your account. If an ASG lists a target group ARN that no longer resolves to a real resource, the check fails.

In practice this happens because of one of a few situations:

Someone deleted a target group directly in the console or via the CLI without updating the ASGs attached to it.
A Terraform or CloudFormation change destroyed and recreated a target group, but the ASG kept referencing the old ARN.
A load balancer was torn down during an environment migration and the dependent ASG was left behind.

Note: Modern Auto Scaling groups attach to Application and Network Load Balancers through target groups, not directly to the load balancer. The older LoadBalancerNames attribute applies only to Classic Load Balancers. This check covers the target group path, which is what nearly all current deployments use.

Why it matters

The risk here is silent capacity loss. Everything looks healthy on the surface, but the safety net you think you have is not connected to anything.

New instances never serve traffic

When the ASG scales out, it tries to register new instances with a target group that does not exist. The registration silently fails. Your existing instances keep absorbing all the load while fresh, idle instances sit alongside them. You pay for compute that does nothing.

Scaling events can become outages

Consider a Black Friday traffic spike. CPU climbs, the scaling policy fires, and the ASG dutifully launches five new instances. None of them register behind the load balancer. The original instances stay pinned at high utilization, latency spikes, and requests start timing out. The system designed to absorb the surge made no difference at all.

Instance refresh and rollouts break

An instance refresh replaces old instances with new ones. If the new instances cannot register with a valid target group, you can end up draining healthy capacity and replacing it with capacity that serves nothing. A routine deploy turns into a production incident.

Warning: Because the failure is silent, monitoring built only on instance health will not catch it. The instances are running and healthy from EC2's perspective. You need to watch target group registration counts and request distribution to see the gap.

How to fix it

There are two valid outcomes: either the target group should exist and needs to be recreated and reattached, or it was deleted on purpose and the stale reference should be removed. Start by figuring out which one applies.

Step 1: Identify the broken reference

List the target group ARNs attached to the ASG:

aws autoscaling describe-auto-scaling-groups \
  --auto-scaling-group-names my-app-asg \
  --query 'AutoScalingGroups[0].TargetGroupARNs'

Then check whether each ARN still resolves to a live target group:

aws elbv2 describe-target-groups \
  --target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-app-tg/abc123

If this returns TargetGroupNotFound, the reference is stale.

Step 2a: Detach the stale target group

If the target group was intentionally removed, detach the dead ARN from the ASG:

Danger: Detaching a target group from a production ASG changes how traffic reaches your instances. Confirm you are removing the inactive ARN and not a working one. Run this against staging first if you can.

aws autoscaling detach-load-balancer-target-groups \
  --auto-scaling-group-name my-app-asg \
  --target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-app-tg/abc123

Step 2b: Recreate and reattach the target group

If the instances are supposed to be load balanced, recreate the target group and attach it. Create the target group first:

aws elbv2 create-target-group \
  --name my-app-tg \
  --protocol HTTP \
  --port 80 \
  --vpc-id vpc-0abc123 \
  --health-check-path /healthz \
  --target-type instance

Add a listener rule on your load balancer to forward to the new target group, then attach it to the ASG:

aws autoscaling attach-load-balancer-target-groups \
  --auto-scaling-group-name my-app-asg \
  --target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-app-tg/def456

Existing in-service instances will register automatically. Confirm they land in the target group:

aws elbv2 describe-target-health \
  --target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-app-tg/def456

Step 3: Fix it in your IaC, not just the console

If you manage infrastructure with Terraform, a console fix will get reverted on the next apply. Wire the target group and ASG together so the dependency is explicit:

resource "aws_lb_target_group" "app" {
  name        = "my-app-tg"
  port        = 80
  protocol    = "HTTP"
  vpc_id      = aws_vpc.main.id
  target_type = "instance"

  health_check {
    path = "/healthz"
  }
}

resource "aws_autoscaling_group" "app" {
  name                = "my-app-asg"
  min_size            = 2
  max_size            = 10
  desired_capacity    = 2
  vpc_zone_identifier = aws_subnet.private[*].id

  target_group_arns = [aws_lb_target_group.app.arn]

  launch_template {
    id      = aws_launch_template.app.id
    version = "$Latest"
  }
}

Because target_group_arns references the resource directly, Terraform understands the dependency and will refuse to delete the target group while the ASG still points at it.

Tip: When you do need to replace a target group, set create_before_destroy = true in a lifecycle block on the target group. Terraform creates the replacement and reattaches before tearing down the old one, so you never leave the ASG pointing at nothing.

How to prevent it from happening again

This misconfiguration is almost always a side effect of deleting a resource without checking what depends on it. The fix is to make those dependencies impossible to ignore.

Manage the relationship in code

The single most effective prevention is letting your IaC tool own both the target group and the ASG, with an explicit reference between them. As shown above, that reference forces the dependency into the plan, so a destroy of the target group surfaces during review instead of in production.

Add a policy-as-code gate in CI

Catch stale or missing references before they merge. With Terraform, run a plan in CI and fail the build if a target group is being destroyed while still referenced. A simple OPA Conftest policy can flag the dangerous case:

package terraform.asg

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_lb_target_group"
  resource.change.actions[_] == "delete"
  not resource.change.actions[_] == "create"
  msg := sprintf("Target group %s is being deleted; confirm no ASG references it", [resource.address])
}

Run the Lensix check on a schedule

IaC drift, manual console changes, and emergency hotfixes all slip past CI. Running asg_inactivelb continuously against your live environment closes that gap. It compares what your ASGs reference against what actually exists, so a target group deleted by hand at 2am shows up in the next scan rather than during your next traffic spike.

Note: Pair this check with a CloudWatch alarm on the target group's HealthyHostCount metric. If a target group drops to zero healthy hosts unexpectedly, you get paged. The Lensix check tells you the wiring is wrong; the alarm tells you it is currently hurting.

Best practices

Never delete a target group without checking references. Run describe-target-health and confirm no ASG lists the ARN before removal.
Treat target groups as managed resources. Avoid creating them by hand if the ASG that uses them lives in code. Mixed ownership is where stale references are born.
Use lifecycle ordering for replacements. create_before_destroy on target groups prevents the brief window where an ASG points at nothing.
Alarm on registration, not just instance health. EC2 health says the box is up. Target health says it is actually serving requests.
Test scaling, not just deploys. Periodically trigger a scale-out in a non-production environment and confirm new instances register and receive traffic. A broken reference shows itself immediately under a real scaling event.
Audit after migrations. Load balancer and VPC migrations are the most common source of orphaned ASG references. Scan your scaling groups once the dust settles.

A target group reference is a small line in an ASG config, but it is the link between scaling and serving traffic. Keep that link honest and your Auto Scaling group does exactly what you expect when the load arrives.

ASG References Inactive Load Balancer: Fixing Stale Target Group References