This check flags Compute Engine VMs set to TERMINATE on host maintenance instead of MIGRATE, which means Google will shut the instance down during routine infrastructure maintenance. Set onHostMaintenance to MIGRATE unless the VM genuinely cannot be live migrated.
Google Compute Engine runs your VMs on physical hosts that need regular maintenance: kernel patches, hardware swaps, microcode updates, and security fixes. When the host running your VM needs work, Compute Engine has two options depending on how the instance is configured. It can live migrate the VM to a healthy host with no downtime, or it can terminate the VM outright. This check catches instances configured for the second option.
For most workloads, terminating a VM during routine maintenance is an unforced outage. It is the kind of misconfiguration that sits quietly until Google schedules maintenance on the right host, and then your service goes dark at a time you did not choose.
What this check detects
The compute_nomaintmigration check inspects the scheduling configuration of each Compute Engine VM and reports any instance where the host maintenance behavior is set to TERMINATE rather than MIGRATE.
The relevant field lives under the instance's scheduling block:
{
"scheduling": {
"onHostMaintenance": "TERMINATE",
"automaticRestart": true,
"preemptible": false
}
}
When onHostMaintenance is TERMINATE, Compute Engine stops the VM during a maintenance event instead of moving it to another host. If automaticRestart is true, the VM is restarted afterward, but it still incurs a hard stop and a fresh boot. The instance loses everything in memory, drops every in-flight connection, and reboots from scratch.
Note: Live migration is the default for standard VMs. You usually end up with TERMINATE either because someone set it deliberately, or because the VM uses a feature that is incompatible with live migration, such as GPUs or certain sole-tenant configurations. The check surfaces both cases so you can decide whether the setting is intentional.
Why it matters
Host maintenance is not rare. Google performs it on a rolling basis across its fleet, and any given VM can be affected several times a year. With MIGRATE, you never notice. With TERMINATE, every maintenance event becomes a reboot.
The concrete risks
- Unscheduled downtime. The VM goes down when Google decides to maintain the host, not when your change window opens. A stateful single-instance service, a self-hosted database, or a license server can take an outage at 3pm on a Tuesday.
- Lost in-memory state. Caches, session data, queued work held only in RAM, and anything not yet flushed to disk disappears on termination.
- Dropped connections. Long-lived gRPC streams, websocket sessions, database connections, and file transfers all break.
- Cascading failures. If the VM is a singleton dependency for other services, its restart can ripple outward into retries, timeouts, and degraded performance across your stack.
- Slow recovery. Boot time plus application warm-up can mean minutes of unavailability, not seconds. For VMs with attached local SSDs, the data on those disks is lost entirely on termination.
Warning: If your VM has local SSD scratch disks, a TERMINATE maintenance event wipes that data. Local SSDs do not survive a stop. If you rely on local SSD for anything you cannot afford to lose, you need a fundamentally different design, not just a scheduling flag.
When TERMINATE is actually the right call
This is not always a mistake. Some VMs cannot be live migrated, and forcing MIGRATE on them will fail. GPU-attached instances are the classic example. In those cases TERMINATE with automaticRestart enabled is the correct configuration, and the goal is to make sure your application handles the restart gracefully rather than to flip the flag blindly.
How to fix it
The fix is to set onHostMaintenance to MIGRATE. The instance does not need to be deleted and recreated, but it does need to be stopped to change this setting.
Warning: Changing scheduling options requires the VM to be stopped first. This means a brief, planned outage. Do it during a maintenance window, and make sure you have a path to drain traffic or fail over before you stop the instance.
Option 1: gcloud CLI
First, confirm the current setting:
gcloud compute instances describe my-instance \
--zone=us-central1-a \
--format="value(scheduling.onHostMaintenance)"
Stop the instance, update the scheduling, then start it again:
# Stop the VM
gcloud compute instances stop my-instance --zone=us-central1-a
# Set host maintenance behavior to MIGRATE
gcloud compute instances set-scheduling my-instance \
--zone=us-central1-a \
--maintenance-policy=MIGRATE \
--restart-on-failure
# Start it back up
gcloud compute instances start my-instance --zone=us-central1-a
Verify the change took effect:
gcloud compute instances describe my-instance \
--zone=us-central1-a \
--format="value(scheduling.onHostMaintenance)"
# Expected output: MIGRATE
Tip: If you have many instances to fix, loop over them. List affected VMs with a filter, then iterate. Always test on a non-production instance first so you understand the stop and start timing for your environment.
# Find all VMs in a zone set to TERMINATE
gcloud compute instances list \
--filter="scheduling.onHostMaintenance=TERMINATE AND zone:us-central1-a" \
--format="value(name)"
Option 2: Google Cloud Console
- Open Compute Engine and select the VM instance.
- Stop the instance if it is running.
- Click Edit.
- Scroll to Management, then Availability policies.
- Set On host maintenance to Migrate VM instance.
- Save, then start the instance.
Option 3: Terraform
If you manage instances with Terraform, set the scheduling block explicitly so the configuration is enforced and visible in code:
resource "google_compute_instance" "app" {
name = "app-server"
machine_type = "e2-standard-4"
zone = "us-central1-a"
scheduling {
on_host_maintenance = "MIGRATE"
automatic_restart = true
preemptible = false
}
boot_disk {
initialize_params {
image = "debian-cloud/debian-12"
}
}
network_interface {
network = "default"
}
}
Danger: Changing on_host_maintenance in Terraform may force a stop or, depending on other attributes, a replacement of the instance. Run terraform plan and read the output carefully before applying. A replacement destroys the VM and any local SSD data attached to it.
What about VMs that cannot migrate?
For GPU instances and other workloads where MIGRATE is not supported, keep TERMINATE but make the behavior safe and explicit:
scheduling {
on_host_maintenance = "TERMINATE"
automatic_restart = true
}
guest_accelerator {
type = "nvidia-tesla-t4"
count = 1
}
The goal here is graceful handling, not avoidance. Make sure automatic_restart is on, persist state to durable storage, and design the application to recover cleanly from a reboot.
How to prevent it from happening again
Fixing the VMs you have today is half the job. The other half is stopping new TERMINATE instances from creeping in.
Policy-as-code with Organization Policy
Google Cloud does not have a built-in org policy constraint for maintenance behavior, so the practical enforcement point is your IaC pipeline and a CSPM tool like Lensix scanning continuously. Treat the scheduling block as a required, reviewed field.
OPA / Conftest gate in CI
Add a policy to your CI pipeline that rejects Terraform plans setting on_host_maintenance to TERMINATE unless the instance has an accelerator attached:
package compute.maintenance
deny[msg] {
resource := input.resource_changes[_]
resource.type == "google_compute_instance"
sched := resource.change.after.scheduling[_]
sched.on_host_maintenance == "TERMINATE"
not has_accelerator(resource.change.after)
msg := sprintf(
"Instance '%s' uses TERMINATE on host maintenance without a GPU. Use MIGRATE.",
[resource.change.after.name],
)
}
has_accelerator(after) {
count(after.guest_accelerator) > 0
}
Wire it into your pipeline against the JSON plan:
terraform plan -out=tfplan.binary
terraform show -json tfplan.binary > tfplan.json
conftest test tfplan.json --policy ./policy
Tip: The GPU exception in the policy above keeps the gate from blocking legitimate workloads. Encode the exception in code rather than disabling the check, so the reasoning stays visible to the next engineer who reads it.
Continuous scanning
IaC gates only cover resources created through IaC. Click-ops VMs, instances spun up by scripts, and drift all slip past. Run a continuous check across your live GCP environment so any VM that ends up on TERMINATE, regardless of how it got there, is flagged. This is exactly what the Lensix compute_nomaintmigration check does on every scan.
Best practices
- Default to MIGRATE. Unless a workload genuinely cannot live migrate, MIGRATE should be the standard across your fleet. It is the lowest-friction way to avoid maintenance-driven outages.
- Set scheduling explicitly in IaC. Do not rely on provider defaults. Spell out
on_host_maintenanceandautomatic_restartin every instance and instance template so the behavior is reviewable in code. - Design for restarts anyway. Live migration is not a guarantee of perfect uptime, and migration can briefly degrade performance. Build applications that tolerate restarts, persist state to durable storage, and recover automatically.
- Use managed instance groups for stateless workloads. MIGs with autohealing and rolling updates make individual VM disruptions a non-event, which matters far more than any single scheduling flag.
- Never trust local SSD as durable. Local SSD data does not survive a stop or a terminate. If you use it, treat it strictly as scratch space.
- Document the exceptions. When a VM legitimately uses
TERMINATE, record why. A GPU instance withautomaticRestarton is fine, but the next person should not have to guess whether it was deliberate.
The cheapest outage to prevent is the one you cause yourself. A single scheduling field is the difference between Google quietly moving your VM and Google quietly shutting it down.
Audit your fleet, flip the workloads that should be on MIGRATE, keep an explicit and justified exception list for the rest, and put a gate in CI so the problem does not come back. Lensix will keep watching the live environment for the cases that bypass your pipeline.

