Azure VM Scale Set With Zero Instances: Fix Guide

TL;DR

This check flags Azure VM Scale Sets that currently have zero running instances. An empty scale set serves no traffic, often signals a misconfigured autoscale rule or forgotten resource, and can hide reliability gaps. Either scale it back up, fix the autoscale policy, or delete it if it's dead weight.

A Virtual Machine Scale Set (VMSS) is the workhorse behind a lot of Azure infrastructure: stateless web tiers, batch workers, AKS node pools, and anything else that needs to scale horizontally. When Lensix reports that a scale set has zero instances, it means the resource exists but has no VMs running underneath it. That's almost never the intended steady state, and it's worth a look.

This post walks through what the vm_emptyscalesets check looks for, why an empty scale set is a problem worth investigating, and how to fix and prevent it.

What this check detects

The check inspects every VM Scale Set in your Azure subscription and reports any where the current instance count is 0. In Azure terms, that's a scale set whose sku.capacity is zero, or whose autoscale profile has scaled it down to nothing.

There are a few ways a scale set ends up empty:

It was provisioned with an initial capacity of zero and never scaled up.
An autoscale rule scaled it down to its minimum, and that minimum is set to zero.
Someone manually scaled it to zero to save money during testing and forgot to scale it back.
A deployment failed partway through, leaving the scale set definition without instances.
It's a leftover from a decommissioned workload that was never cleaned up.

Note: An empty scale set still costs nothing in compute, since you only pay for running instances. But the scale set object, its load balancer, public IPs, and associated disks may still incur charges. Zero instances does not always mean zero cost.

Why it matters

An empty scale set is rarely a security hole on its own, but it's a strong signal that something is off. Here's the practical risk.

Reliability and availability gaps

If the scale set is supposed to be serving traffic and it has no instances, you have an outage or a partial outage right now. A load balancer pointing at an empty backend pool returns errors or times out. If this is your production web tier, customers are seeing failures.

Warning: A scale set with a minimum capacity of zero can be scaled down by an aggressive autoscale rule during a quiet period and then fail to scale back up if the scale-out rule is misconfigured or the region is capacity-constrained. You can end up with a self-inflicted outage that nobody triggered manually.

Broken autoscale logic

The most common cause is an autoscale profile with a minimum of zero combined with a metric threshold that the scale-out rule never crosses while the set is empty. For example, if you scale out based on CPU and there are no instances generating CPU metrics, the metric may report as unavailable, and the scale-out rule never fires. The scale set sits at zero indefinitely.

Forgotten and unmanaged resources

Empty scale sets are a classic form of cloud sprawl. A leftover scale set keeps its network security groups, its identity assignments, its role bindings, and its disks attached. That's attack surface and management overhead for a resource doing no useful work. If the scale set has a system-assigned managed identity with role assignments, that identity is still a valid credential someone could exploit if instances are spun up unexpectedly.

Cost leakage

While compute is free at zero instances, the surrounding infrastructure is not. Standard load balancers, public IP addresses, and any persistent data disks attached to the scale set model can quietly accrue charges for a resource that delivers no value.

How to fix it

The right fix depends on why the scale set is empty. Start by figuring that out.

Step 1: Inspect the scale set

List your scale sets and their current capacity:

az vmss list \
  --query "[].{Name:name, RG:resourceGroup, Capacity:sku.capacity}" \
  --output table

For a specific scale set, check the full details including any autoscale settings:

az vmss show \
  --resource-group myResourceGroup \
  --name myScaleSet \
  --query "{Capacity:sku.capacity, Provisioning:provisioningState}"

Then check whether autoscale is managing it:

az monitor autoscale show \
  --resource-group myResourceGroup \
  --name myAutoscaleSetting \
  --query "profiles[].capacity"

Step 2a: Scale it back up (if it should be serving traffic)

If the scale set is supposed to be live, set its capacity manually to restore service:

az vmss scale \
  --resource-group myResourceGroup \
  --name myScaleSet \
  --new-capacity 3

Step 2b: Fix the autoscale minimum

If autoscale dropped it to zero, raise the minimum so the set always keeps a baseline of instances:

az monitor autoscale update \
  --resource-group myResourceGroup \
  --name myAutoscaleSetting \
  --min-count 2 \
  --max-count 10 \
  --count 2

Setting a minimum of at least two also gives you fault tolerance across availability zones or fault domains, so a single instance failure does not take the whole tier down.

Step 2c: Delete it (if it's abandoned)

If the scale set is a leftover from a decommissioned workload, remove it along with its dependent resources.

Danger: Deleting a scale set is irreversible and removes its VM model, attached data disks, and any extensions. Confirm the workload is truly retired and that no other resource depends on it before running this. Check for dependent load balancer backend pools and DNS records first.

az vmss delete \
  --resource-group myResourceGroup \
  --name myScaleSet

Afterward, clean up orphaned public IPs, load balancers, and disks that the scale set left behind.

Tip: Before deleting, tag the scale set with a retirement date and let it sit for a sprint. If nothing breaks and no one complains, you can delete with confidence. A 14-day grace period catches the "actually we still needed that" surprises.

How to prevent it from happening again

Manual scaling and ad hoc autoscale rules are where most empty scale sets come from. Push the configuration into infrastructure as code and guard it with policy.

Define minimums in Terraform

If you manage scale sets with Terraform, set an explicit non-zero capacity and pin your autoscale floor:

resource "azurerm_linux_virtual_machine_scale_set" "web" {
  name                = "web-vmss"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  sku                 = "Standard_DS2_v2"
  instances           = 2

  # ... admin, network, os_disk config ...
}

resource "azurerm_monitor_autoscale_setting" "web" {
  name                = "web-autoscale"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  target_resource_id  = azurerm_linux_virtual_machine_scale_set.web.id

  profile {
    name = "default"
    capacity {
      default = 2
      minimum = 2
      maximum = 10
    }
    # ... scale rules ...
  }
}

Enforce it with Azure Policy

You can use Azure Policy to audit autoscale settings and flag any profile whose minimum capacity is zero. A custom policy definition matching Microsoft.Insights/autoscalesettings with an audit effect surfaces violations across the subscription without blocking deployments outright.

Add a CI/CD gate

Run Lensix scans as part of your pipeline so an empty or zero-floor scale set fails the build before it reaches production. A scheduled scan against running environments catches drift introduced by manual changes after deployment.

Tip: Wire a Lensix scheduled scan to alert on vm_emptyscalesets directly into your incident channel. An empty production scale set is often the first observable symptom of an outage, and catching it within minutes beats waiting for a customer to report errors.

Best practices

Never set an autoscale minimum to zero for production workloads. Use zero only for deliberately ephemeral or batch scale sets where idle-at-zero is the intended design.
Keep at least two instances for any service-facing tier so you have redundancy across fault domains and zones.
Tag scale sets with an owner and environment so empty ones can be triaged quickly instead of lingering as orphans.
Review autoscale scale-out rules to confirm they can fire when the set is empty. Base scale-out on a metric that is still observable at zero instances, such as queue length or load balancer request count, rather than instance CPU.
Clean up dependent resources when retiring a scale set: public IPs, load balancers, NAT rules, and disks.
Audit regularly. Empty scale sets accumulate over time as workloads change. A recurring scan keeps the count honest.

An empty scale set is cheap to ignore and expensive to overlook. Treat the vm_emptyscalesets finding as a prompt to decide one of three things: it should be running and isn't (fix the outage), it should never hit zero (fix the autoscale floor), or it's dead (delete it). Leaving it in limbo is the only wrong answer.

Azure VM Scale Set Has No Instances: What It Means and How to Fix It