Fix GKE Node Integrity Monitoring Not Enabled | Lensix

TL;DR

This check flags GKE node pools running without Shielded GKE node integrity monitoring, which means GKE can't verify that node boot integrity matches a trusted baseline. Recreate the node pool with --enable-integrity-monitoring (and --shielded-secure-boot) to detect rootkits and boot-level tampering.

Integrity monitoring is one of those settings that costs nothing, adds no measurable overhead, and quietly closes off an entire class of attacks against your Kubernetes nodes. Yet it ships disabled on plenty of older node pools, and people rarely circle back to fix it. This check exists to catch those gaps before an attacker does.

Below we'll cover what the check looks for, why a node booting without verified integrity is a real problem, and exactly how to remediate and prevent it across the console, gcloud, and Terraform.

What this check detects

The gke_nointegritymonitoring check inspects each GKE node pool and reports any pool where integrity monitoring is not enabled. Integrity monitoring is a feature of Shielded GKE Nodes, which build on Compute Engine Shielded VMs.

When integrity monitoring is on, the node's boot measurements are recorded in a virtual Trusted Platform Module (vTPM) and compared against a known-good baseline at each boot. If the measured boot sequence drifts from the expected one, GKE logs an integrity failure event you can alert on.

Note: Shielded GKE Nodes bundle three protections: secure boot (only signed bootloaders and kernels run), vTPM (a hardware-rooted store for boot measurements), and integrity monitoring (continuous comparison of those measurements against a baseline). This check specifically targets the integrity monitoring piece.

A node pool fails the check when its shieldedInstanceConfig.enableIntegrityMonitoring field is false or absent. You can confirm the current state with:

gcloud container node-pools describe POOL_NAME \
  --cluster CLUSTER_NAME \
  --zone COMPUTE_ZONE \
  --format="value(config.shieldedInstanceConfig.enableIntegrityMonitoring)"

An empty result or False means the pool is non-compliant.

Why it matters

A GKE node is a Linux host running your workloads, your kubelet, and credentials that can reach the rest of the cluster. If an attacker gains a foothold on a node, one of their goals is persistence: surviving reboots and staying hidden. Boot-level and kernel-level implants such as rootkits and bootkits are how they do it, because anything that loads before or alongside the OS can hide from tooling running inside the OS.

Without integrity monitoring, you have no signal when a node's boot sequence has been tampered with. The node comes up, joins the cluster, schedules pods, and looks healthy. There is nothing comparing what actually loaded against what should have loaded.

With integrity monitoring enabled, that drift produces an explicit event you can route to alerting. Consider a realistic chain:

An attacker exploits a workload and breaks out to the node via a container escape or an over-permissioned pod.
They install a kernel module or modify the boot process to maintain access across reboots.
On the next boot, the measured integrity no longer matches the baseline.
GKE emits a late boot validation failure, which your monitoring catches and your responders act on.

Without monitoring, step four never happens, and the compromise sits undetected. This is also a compliance concern: CIS GKE Benchmark recommends Shielded GKE Nodes with integrity monitoring, and frameworks like PCI DSS and SOC 2 expect host integrity verification controls.

Warning: Integrity monitoring is detective, not preventive. It tells you a node's boot state changed, it does not block the change. Pair it with secure boot for prevention and with alerting so the integrity events actually reach a human or a pipeline.

How to fix it

Integrity monitoring is set at node pool creation time and cannot be toggled on an existing pool in place. The standard remediation is to create a new node pool with the setting enabled, migrate workloads, then delete the old pool.

Step 1: Create a replacement node pool with integrity monitoring

gcloud container node-pools create POOL_NAME_NEW \
  --cluster CLUSTER_NAME \
  --zone COMPUTE_ZONE \
  --shielded-integrity-monitoring \
  --shielded-secure-boot \
  --machine-type MACHINE_TYPE \
  --num-nodes NUM_NODES

Including --shielded-secure-boot here is deliberate. Monitoring tells you something changed, secure boot stops unsigned code from loading in the first place. Enable both unless a workload requires unsigned kernel modules.

Note: Secure boot can break workloads that load third-party unsigned kernel modules, for example certain GPU drivers or eBPF tooling installed via privileged DaemonSets. Validate in a staging cluster before rolling secure boot to production. Integrity monitoring on its own has no such compatibility risk.

Step 2: Cordon and drain the old pool

Mark the old nodes unschedulable and move pods to the new pool gracefully.

# Cordon each node in the old pool
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=POOL_NAME_OLD -o name); do
  kubectl cordon "$node"
done

# Drain them, respecting PodDisruptionBudgets
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=POOL_NAME_OLD -o name); do
  kubectl drain "$node" --ignore-daemonsets --delete-emptydir-data --grace-period=300
done

Step 3: Delete the old node pool

Danger: Deleting a node pool terminates its nodes and any pods still running on them. Confirm all workloads have rescheduled to the new pool and that kubectl get pods -A shows no pending or evicted pods before you run this.

gcloud container node-pools delete POOL_NAME_OLD \
  --cluster CLUSTER_NAME \
  --zone COMPUTE_ZONE

Console steps

If you prefer the UI:

Go to Kubernetes Engine → Clusters and open your cluster.
Click Add node pool.
Under Security, check Enable integrity monitoring and Enable secure boot.
Set machine type and size to match the old pool, then create it.
Cordon, drain, and delete the old pool as shown above.

Warning: Running two node pools during migration roughly doubles compute cost for that window. Plan the cutover to be short, and delete the old pool promptly once workloads have moved.

How to prevent it from happening again

The reliable fix is to make integrity monitoring (and secure boot) the default for every node pool you create, then enforce that default in code and in CI.

Terraform

Set shielded_instance_config on every node pool resource:

resource "google_container_node_pool" "primary" {
  name     = "primary-pool"
  cluster  = google_container_cluster.primary.id
  location = "us-central1-a"

  node_config {
    machine_type = "e2-standard-4"

    shielded_instance_config {
      enable_integrity_monitoring = true
      enable_secure_boot          = true
    }
  }
}

Tip: Enabling Shielded GKE Nodes at the cluster level with enable_shielded_nodes = true on the google_container_cluster resource turns secure boot and integrity monitoring on by default for new pools, so you do not have to remember the per-pool config every time.

Policy as code with OPA / Conftest

Catch non-compliant Terraform plans before they merge. A Rego rule against your plan JSON:

package gke.integrity

deny[msg] {
  rc := input.resource_changes[_]
  rc.type == "google_container_node_pool"
  config := rc.change.after.node_config[_]
  not config.shielded_instance_config[_].enable_integrity_monitoring
  msg := sprintf("Node pool '%s' must enable integrity monitoring", [rc.change.after.name])
}

Run it in CI:

terraform plan -out=tfplan.binary
terraform show -json tfplan.binary > tfplan.json
conftest test tfplan.json --policy policy/

Detect drift on what's already running

Policy as code stops bad changes at the gate, but it won't catch pools created out of band or pools that predate your policy. Run a periodic audit across all clusters:

for cluster in $(gcloud container clusters list --format="value(name,zone)" | tr '\t' ','); do
  name="${cluster%,*}"; zone="${cluster#*,}"
  gcloud container node-pools list --cluster "$name" --zone "$zone" \
    --format="table(name, config.shieldedInstanceConfig.enableIntegrityMonitoring)"
done

Tip: Rather than maintaining audit scripts yourself, Lensix continuously scans every GKE node pool across all your projects and flags this check automatically, so drift surfaces the moment a non-compliant pool appears instead of at your next manual review.

Best practices

Turn on the full Shielded Nodes set. Integrity monitoring, secure boot, and vTPM work as a unit. Enabling all three at the cluster level gives the strongest baseline with the least per-pool effort.
Alert on integrity failures. Monitoring is worthless if nobody sees the events. Create a Cloud Logging alert on the GKE integrity validation failure log entries and route it to your on-call channel.
Bake it into your golden cluster module. Maintain a single Terraform module that all teams use to provision clusters, with these settings hard-coded on by default.
Test secure boot before enforcing it. Integrity monitoring is safe to enable everywhere. Secure boot needs validation against workloads that load unsigned kernel modules.
Review at the source, not the symptom. A node pool without integrity monitoring is usually a sign your provisioning workflow lacks guardrails. Fix the workflow, not just the one pool.

Integrity monitoring is a near-zero-cost control that gives you a tripwire for one of the stealthiest forms of node compromise. Enable it on every pool, alert on the events it produces, and enforce it in CI so it stays on.

GKE Node Integrity Monitoring Not Enabled: Detect Boot-Level Tampering