Azure VMSS Automatic OS Upgrade Disabled Fix

TL;DR

This check flags Azure VM Scale Sets that have automatic OS image upgrades turned off, which leaves instances running outdated, potentially vulnerable OS images. Enable automatic OS upgrades on the scale set's upgrade policy so patched images roll out without manual intervention.

Azure VM Scale Sets are built to run fleets of identical virtual machines, scaling up and down based on demand. They are a common backbone for stateless web tiers, batch workers, and microservice clusters. But a scale set is only as secure as the OS image its instances boot from, and that image does not refresh itself unless you tell it to. The Scale Set Automatic OS Upgrade Disabled check (vm_autoupgrade) catches scale sets where automatic OS upgrades are not configured, meaning your instances can drift further and further behind the latest patched image over time.

What this check detects

The check inspects each Azure VM Scale Set in your subscriptions and reads the upgrade policy on the resource. Specifically, it looks at the automaticOSUpgradePolicy.enableAutomaticOSUpgrade field. When that flag is false or absent, the scale set is flagged.

With automatic OS upgrades enabled, Azure monitors the platform image referenced by your scale set. When the image publisher releases a new patched version, Azure gradually rolls it out across your instances in batches, respecting your application health probes so it never takes down too many VMs at once.

Note: Automatic OS upgrades only apply when your scale set uses a platform image SKU with a version set to latest, or a custom image managed through Azure Compute Gallery with versioned releases. Scale sets pinned to a specific image version will not receive automatic upgrades, since you have explicitly opted out of moving forward.

Why it matters

An unpatched OS is the most reliable foothold an attacker has. When a kernel vulnerability or a flaw in a bundled package gets a CVE, the window between disclosure and active exploitation keeps shrinking. If your scale set instances are still running last quarter's image, you are exposed for as long as it takes someone to notice and manually trigger a rollout.

The risk compounds with scale. A scale set might run two instances today and forty during a traffic spike. Every new instance that boots from the stale image inherits the same unpatched vulnerabilities. You do not just have a static patching gap, you have a gap that replicates itself every time the fleet grows.

Consider a few concrete scenarios:

Kernel-level CVE in the base image. A privilege escalation bug ships in the Ubuntu or Windows image your scale set uses. With automatic upgrades on, Azure rolls the patched image out within days. With it off, your instances stay vulnerable until an engineer remembers to act.
Compliance drift. Frameworks like CIS, SOC 2, and PCI DSS expect timely patching with evidence. A scale set quietly running images that are months old undermines that evidence and shows up in audits.
Inconsistent fleet state. When some instances get manually patched and others do not, you end up debugging issues that only reproduce on certain VMs. Drift makes incident response slower.

Warning: Disabling automatic OS upgrades is sometimes a deliberate choice for workloads that need tight control over change windows, such as stateful applications or systems with strict validation requirements. If that describes your scale set, treat this finding as a prompt to confirm you have an alternative patching process, not as a bug to silence.

How to fix it

You can enable automatic OS upgrades through the Azure CLI, the portal, or infrastructure as code. The key requirement is an upgrade policy mode that supports rolling upgrades and, ideally, an application health probe so Azure knows when an instance is healthy before moving to the next batch.

Azure CLI

First confirm the current state of the scale set:

az vmss show \
  --resource-group myResourceGroup \
  --name myScaleSet \
  --query "upgradePolicy"

Set the upgrade policy mode to Rolling and enable automatic OS upgrades:

az vmss update \
  --resource-group myResourceGroup \
  --name myScaleSet \
  --set upgradePolicy.mode=Rolling \
        upgradePolicy.automaticOSUpgradePolicy.enableAutomaticOSUpgrade=true

Warning: Switching the upgrade policy to Rolling changes how the scale set applies model updates. Existing instances will be reimaged in batches when the next OS upgrade is available, which means short, staggered disruptions. Make sure you have an application health probe or health extension configured so Azure does not advance to the next batch while instances are still unhealthy.

Attach a health probe through the load balancer so Azure can gate the rollout on real application health:

az vmss update \
  --resource-group myResourceGroup \
  --name myScaleSet \
  --set virtualMachineProfile.networkProfile.healthProbe.id="/subscriptions/<sub-id>/resourceGroups/myResourceGroup/providers/Microsoft.Network/loadBalancers/myLB/probes/myHealthProbe"

Verify the change took effect:

az vmss show \
  --resource-group myResourceGroup \
  --name myScaleSet \
  --query "upgradePolicy.automaticOSUpgradePolicy.enableAutomaticOSUpgrade"

Azure Portal

Open the VM Scale Set in the Azure portal.
Under Settings, select Upgrade policy (or Operating system depending on the portal version).
Set the upgrade mode to Rolling.
Toggle Enable automatic OS upgrades to on.
Confirm an application health probe or the Application Health extension is configured, then save.

Terraform

If you manage scale sets with Terraform, set the upgrade fields directly on the resource so the configuration is the source of truth:

resource "azurerm_linux_virtual_machine_scale_set" "web" {
  name                = "web-vmss"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  sku                 = "Standard_D2s_v5"
  instances           = 3

  upgrade_mode = "Rolling"

  automatic_os_upgrade_policy {
    enable_automatic_os_upgrade = true
    disable_automatic_rollback  = false
  }

  rolling_upgrade_policy {
    max_batch_instance_percent              = 20
    max_unhealthy_instance_percent          = 20
    max_unhealthy_upgraded_instance_percent = 20
    pause_time_between_batches              = "PT5M"
  }

  health_probe_id = azurerm_lb_probe.web.id

  # source_image_reference must use "latest" for the version
  source_image_reference {
    publisher = "Canonical"
    offer     = "0001-com-ubuntu-server-jammy"
    sku       = "22_04-lts-gen2"
    version   = "latest"
  }

  # ... os_profile, network_interface, etc.
}

Tip: The rolling_upgrade_policy block gives you precise control over blast radius. Lowering max_batch_instance_percent upgrades fewer instances at a time, and raising pause_time_between_batches gives your monitoring more time to catch a regression before the next batch rolls.

How to prevent it from happening again

Fixing one scale set is easy. Keeping every future scale set compliant is the real goal. Bake the requirement into the places where infrastructure gets defined and deployed.

Enforce with Azure Policy

Azure has a built-in policy definition that audits scale sets without automatic OS upgrades. Assign it at the subscription or management group level so new resources are evaluated automatically:

az policy assignment create \
  --name "require-vmss-auto-os-upgrade" \
  --display-name "VMSS should have automatic OS upgrades enabled" \
  --policy "465f0161-0087-490a-9ad9-ad6217f4f43a" \
  --scope "/subscriptions/<sub-id>"

Start in Audit mode to measure your current state, then move toward Deny once teams have remediated existing scale sets and updated their templates.

Note: Policy IDs can vary across clouds and over time. List the available built-ins with az policy definition list --query "[?contains(displayName, 'automatic OS upgrade')]" to confirm the current GUID before assigning.

Gate it in CI/CD

Catch the misconfiguration before it ever reaches Azure by scanning your Terraform or Bicep in the pipeline. A tool like Checkov or tfsec can flag a scale set that sets enable_automatic_os_upgrade = false or omits the block entirely. Add a step to your pull request workflow:

checkov -d ./infra --framework terraform \
  --check CKV_AZURE_95

Fail the build on a violation so the fix happens in code review rather than in production.

Continuous monitoring with Lensix

Policy and CI cover the resources that flow through your pipelines, but real environments accumulate scale sets created by hand, by other teams, or by older deployments. Lensix runs the vm_autoupgrade check continuously across your subscriptions, so a scale set that drifts out of compliance gets surfaced regardless of how it was created.

Best practices

Use latest or gallery-versioned images. Automatic OS upgrades require a moving target. Pin to latest for platform images, or publish new versions to an Azure Compute Gallery for custom images.
Always pair upgrades with a health probe. Without one, Azure cannot tell a healthy instance from a broken one, and a bad image could roll across your whole fleet. The Application Health extension or a load balancer probe is what makes rolling upgrades safe.
Tune the rolling policy for your tolerance. A customer-facing API tier should upgrade in small batches with longer pauses. A batch worker pool can move faster. Match the policy to the workload.
Keep automatic rollback enabled. Leave disable_automatic_rollback set to false so Azure reverts an upgrade that fails health checks instead of leaving you with a broken batch.
Document deliberate exceptions. If a scale set genuinely cannot use automatic upgrades, record why, define an alternative patch cadence, and tag the resource so reviewers know the finding is expected rather than overlooked.

Tip: Test your upgrade policy on a non-production scale set first. Trigger a manual rolling upgrade, watch the batches roll through your health probe, and confirm rollback behaves as expected. Once you trust the mechanics, enabling automatic upgrades in production becomes a low-stress change.

Automatic OS upgrades turn patching from a recurring manual chore into a background process that quietly keeps your fleet current. For scale sets, where instances come and go constantly, that consistency is exactly what keeps your attack surface from growing while you are not looking.

Scale Set Automatic OS Upgrade Disabled: Why It Matters and How to Fix It