Fix Idle EC2 Instances on AWS | Lensix Check

TL;DR

This check flags EC2 instances with near-zero CPU and network activity over the past 7 days, which usually means you are paying for compute nobody uses. Confirm the instance is actually abandoned, then stop or terminate it (or right-size it) to stop the bleed.

Idle EC2 instances are one of the most common and least glamorous sources of cloud waste. They rarely break anything, they do not trip alarms, and they quietly accrue charges month after month. Worse, an instance that nobody is watching is also an instance that nobody is patching, which turns a cost problem into a security problem.

The Instance Appears Unused check (ec2_unused) looks at your running EC2 instances and surfaces the ones that have shown almost no signs of life for a full week.

What this check detects

Lensix pulls CloudWatch metrics for each running EC2 instance and evaluates activity over a rolling 7-day window. An instance is flagged when both of these hold true:

CPU utilization has stayed near zero (typically under a few percent average and never spiking).
Network throughput (NetworkIn and NetworkOut) has been negligible.

The combination matters. A box can sit at low CPU while still serving traffic, and a box can be CPU-busy with no network at all. When both are flat for seven straight days, the instance is almost certainly doing no useful work.

Note: The default EC2 CloudWatch metrics are reported at 5-minute resolution unless you enable detailed monitoring. The 7-day window is long enough that basic monitoring is enough to spot a genuinely idle instance, but it can miss a workload that runs for two minutes once a week. Keep that in mind when reviewing results.

Why it matters

Direct cost

A running instance bills whether or not it does anything. A single m5.xlarge left on around the clock costs roughly $140 per month on demand, plus its EBS volumes, plus any data transfer. Multiply that across a fleet of forgotten dev boxes, abandoned proof-of-concept stacks, and "I'll clean it up later" instances, and idle compute becomes a meaningful line item.

Security exposure

The cost angle gets the attention, but the security angle is the one that bites. Idle instances are usually orphaned, and orphaned instances tend to:

Fall behind on OS and package patches because nobody owns them.
Run old AMIs with known CVEs.
Hold IAM instance profiles with permissions that no longer match any active purpose.
Keep security group rules open that were opened for a since-forgotten reason.

An abandoned EC2 instance is a standing foothold. If an attacker compromises it, no operator is watching the logs, and the instance role may still grant access to S3, Secrets Manager, or other services.

Operational noise

Every unused instance is something your team has to mentally account for during audits, incident response, and capacity planning. Reducing the fleet to only what is actually used makes everything else easier to reason about.

How to fix it

Do not start by terminating. Idle does not always mean abandoned, and a 7-day flat line can hide a legitimate but infrequent job. Work through these steps in order.

1. Identify the instance and its owner

Pull the instance details and its tags so you can figure out who created it and why.

aws ec2 describe-instances \
  --instance-ids i-0abc123def456 \
  --query 'Reservations[].Instances[].{ID:InstanceId,Type:InstanceType,Launched:LaunchTime,Tags:Tags}' \
  --output json

If there is an Owner, Team, or Project tag, ask that owner before doing anything. If there are no tags at all, that absence is itself a signal that the instance has slipped through the cracks.

2. Confirm the metrics yourself

Verify the idle pattern directly from CloudWatch so you are not acting on stale data.

aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-0abc123def456 \
  --start-time "$(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ)" \
  --end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
  --period 86400 \
  --statistics Average Maximum \
  --output table

Repeat for NetworkIn and NetworkOut. If the maximum CPU never climbs above a couple of percent and network bytes stay tiny, the instance is genuinely idle.

3. Decide: stop, right-size, or terminate

You have three reasonable outcomes depending on what you learn.

Stop the instance if you think it might be needed again. Stopping halts compute billing while preserving EBS volumes and configuration. This is the safe, reversible choice.

Warning: Stopping an instance does not stop EBS or Elastic IP charges. A stopped instance still pays for its attached volumes, and any Elastic IP not associated with a running instance bills hourly. Stopping reduces cost, it does not zero it out.

aws ec2 stop-instances --instance-ids i-0abc123def456

Right-size if the instance is used but oversized for the trickle of work it does. Stop it, change the instance type, and start it again.

aws ec2 stop-instances --instance-ids i-0abc123def456
aws ec2 wait instance-stopped --instance-ids i-0abc123def456
aws ec2 modify-instance-attribute \
  --instance-id i-0abc123def456 \
  --instance-type "{\"Value\": \"t3.small\"}"
aws ec2 start-instances --instance-ids i-0abc123def456

Terminate only once you are confident the instance is abandoned and you have what you need.

Danger: Termination is irreversible. Any data on instance-store volumes is lost immediately, and EBS volumes are deleted too unless DeleteOnTermination is false. Snapshot anything you might need before running the command below.

# Snapshot the root volume first
VOLUME_ID=$(aws ec2 describe-instances \
  --instance-ids i-0abc123def456 \
  --query 'Reservations[].Instances[].BlockDeviceMappings[].Ebs.VolumeId' \
  --output text)

aws ec2 create-snapshot \
  --volume-id "$VOLUME_ID" \
  --description "Pre-termination backup of i-0abc123def456"

# Then terminate
aws ec2 terminate-instances --instance-ids i-0abc123def456

Tip: Before terminating, enable termination protection on instances you are unsure about and apply an idle-review tag with a date. That gives you a grace period and a paper trail rather than a permanent decision made under time pressure.

How to prevent it from happening again

One-off cleanup is satisfying but temporary. The instances will pile up again unless you build guardrails.

Require ownership tags at launch

An instance with no owner is an instance nobody will clean up. Enforce tagging with a Service Control Policy so untagged launches are rejected outright.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RequireOwnerTagOnRunInstances",
      "Effect": "Deny",
      "Action": "ec2:RunInstances",
      "Resource": "arn:aws:ec2:*:*:instance/*",
      "Condition": {
        "Null": {
          "aws:RequestTag/Owner": "true"
        }
      }
    }
  ]
}

Automate idle detection and shutdown

Set CloudWatch alarms that act on prolonged low CPU. The alarm can stop the instance automatically using an EC2 action, no Lambda required.

aws cloudwatch put-metric-alarm \
  --alarm-name "stop-idle-i-0abc123def456" \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --statistic Average \
  --period 3600 \
  --evaluation-periods 168 \
  --threshold 2 \
  --comparison-operator LessThanThreshold \
  --dimensions Name=InstanceId,Value=i-0abc123def456 \
  --alarm-actions "arn:aws:automate:us-east-1:ec2:stop"

Schedule non-production environments

Most dev and test instances do not need to run nights and weekends. AWS Instance Scheduler or a simple tag-driven Lambda can shut them down on a calendar, cutting roughly two thirds of their runtime cost without anyone lifting a finger.

Tip: Tag schedulable instances with something like schedule=office-hours and let your automation key off the tag. New instances opt in just by carrying the tag, so the policy scales without per-instance config.

Gate IaC with policy-as-code

Catch oversized or untagged instances before they are ever created. A Terraform plan check with Open Policy Agent or Checkov in CI can fail the pipeline when an instance lacks an owner tag or uses an outsized type for a dev workspace.

# Example: scan a Terraform plan in CI
checkov -d . --check CKV_AWS_126   # ensures detailed monitoring is considered
tfsec .                            # surfaces tagging and instance-type issues

Best practices

Review the idle list regularly. Make it a recurring task, not a once-a-year audit. Lensix surfaces these continuously so the list never grows stale.
Prefer stop over terminate for anything uncertain. Reversible decisions cost almost nothing when you preserve EBS, and they protect you from deleting something that turns out to matter.
Clean up the leftovers. After terminating, sweep up orphaned EBS volumes, unassociated Elastic IPs, and stale snapshots. The instance is the visible cost, but its attachments keep billing.
Right-size before you assume idle. Sometimes a flat-lined instance is the correct workload running on far too much hardware. Down-sizing keeps the function and kills the waste.
Tie instances to a lifecycle. Every instance should map to a project, an owner, and an expected end date. When the project ends, decommission is a deliberate step, not a forgotten one.
Use Compute Optimizer. AWS Compute Optimizer reads the same metrics and recommends type changes or termination, giving you a second opinion alongside this check.

Idle instances are easy to ignore precisely because they cause no immediate pain. Treat them as both a cost leak and a security liability, automate their detection, and the fleet stays lean on its own.

Instance Appears Unused: Finding and Fixing Idle EC2 Instances