GKE Node Disk CMEK Encryption Fix | Lensix

TL;DR

This check flags GKE node pools whose boot and data disks rely on Google's default encryption instead of a customer-managed encryption key (CMEK). Without CMEK you lose control over key rotation, access policies, and revocation. Fix it by recreating the node pool with a Cloud KMS key passed via --boot-disk-kms-key.

Every persistent disk attached to a GKE node is encrypted at rest by default. That sounds reassuring, and for many workloads it is enough. But "encrypted by default" means Google owns and manages the keys, which removes you from the loop entirely. You cannot rotate them on your schedule, you cannot revoke them during an incident, and you cannot prove to an auditor that you control the cryptographic material protecting your data.

The gke_nodeencryption check catches node pools that fall back to Google-managed encryption rather than using a customer-managed encryption key (CMEK) backed by Cloud KMS. It is a common gap because CMEK has to be specified at node pool creation time, and the default path silently skips it.

What this check detects

Lensix inspects each GKE node pool and looks at the disk encryption configuration for the underlying Compute Engine instances. If the boot disk is encrypted with the default Google-managed key rather than a Cloud KMS CMEK, the check fails.

Concretely, the check is looking for the absence of a bootDiskKmsKey field on the node pool config. When that field is empty, GCP uses its own keys. When it is populated with a KMS key resource path, you are using CMEK.

Note: CMEK does not mean you hold the raw key bytes. The key lives inside Cloud KMS (or Cloud HSM / Cloud External Key Manager). You control its lifecycle, IAM bindings, rotation schedule, and the ability to disable it, while Google never exposes the key material itself.

Why it matters

Disk encryption at rest protects against a narrow but real threat: someone gaining physical or low-level access to the storage media. With Google-managed keys you get that protection, but the control boundary stops there. CMEK extends your control in ways that matter during real incidents and audits.

You can revoke access instantly

Suppose a node pool is compromised, or you discover that an over-privileged service account had access to disk snapshots. With CMEK you can disable or destroy the KMS key version, and every disk encrypted with it becomes unreadable. There is no equivalent kill switch with Google-managed keys.

Rotation on your terms

Cloud KMS lets you set automatic rotation periods (for example, every 90 days) and rotate manually on demand. Many compliance frameworks expect documented, customer-controlled key rotation. Default encryption gives you nothing to point to.

Compliance and data residency

PCI DSS, HIPAA, FedRAMP, and a range of internal governance policies frequently require that the customer control encryption keys for regulated data. If a node pool processes cardholder data or PHI and runs on default-encrypted disks, that is an audit finding waiting to happen.

Warning: If you destroy a KMS key version that is protecting live node disks, those disks become permanently unrecoverable. CMEK gives you a powerful kill switch, but the same switch can take out production if used carelessly. Treat key destruction as a deliberate, reviewed action.

How to fix it

Node pool disk encryption is immutable. You cannot retrofit CMEK onto an existing pool, so the fix is to create a new node pool with CMEK enabled, migrate workloads, and delete the old pool. Here is the full sequence.

1. Create or identify a Cloud KMS key

# Create a key ring (regional, matching your cluster region)
gcloud kms keyrings create gke-keyring \
  --location=us-central1

# Create a symmetric encryption key with 90-day rotation
gcloud kms keys create gke-node-disk-key \
  --location=us-central1 \
  --keyring=gke-keyring \
  --purpose=encryption \
  --rotation-period=90d \
  --next-rotation-time=$(date -u -d "+90 days" +%Y-%m-%dT%H:%M:%SZ)

2. Grant the Compute Engine service agent access to the key

GKE provisions disks through the Compute Engine service agent. That agent needs the cloudkms.cryptoKeyEncrypterDecrypter role on the key, or disk creation will fail.

PROJECT_NUMBER=$(gcloud projects describe "$(gcloud config get-value project)" \
  --format="value(projectNumber)")

gcloud kms keys add-iam-policy-binding gke-node-disk-key \
  --location=us-central1 \
  --keyring=gke-keyring \
  --member="serviceAccount:service-${PROJECT_NUMBER}@compute-system.iam.gserviceaccount.com" \
  --role="roles/cloudkms.cryptoKeyEncrypterDecrypter"

3. Create a new node pool with CMEK

gcloud container node-pools create cmek-pool \
  --cluster=my-cluster \
  --region=us-central1 \
  --machine-type=e2-standard-4 \
  --num-nodes=3 \
  --boot-disk-kms-key=projects/$(gcloud config get-value project)/locations/us-central1/keyRings/gke-keyring/cryptoKeys/gke-node-disk-key

4. Migrate workloads and drain the old pool

Danger: Cordoning and draining a node pool evicts running pods. Do this during a maintenance window or confirm your workloads have PodDisruptionBudgets and enough replicas to survive the move. Deleting the old pool is irreversible.

# Cordon all nodes in the old pool so new pods are not scheduled there
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=old-pool \
  -o name); do
  kubectl cordon "$node"
done

# Drain each node to evict pods onto the new CMEK pool
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=old-pool \
  -o name); do
  kubectl drain "$node" --ignore-daemonsets --delete-emptydir-data --timeout=300s
done

# Once workloads are stable on cmek-pool, delete the old pool
gcloud container node-pools delete old-pool \
  --cluster=my-cluster \
  --region=us-central1

Terraform example

If you manage GKE with Terraform, set the disk encryption key on the node config:

resource "google_container_node_pool" "cmek_pool" {
  name     = "cmek-pool"
  cluster  = google_container_cluster.primary.id
  location = "us-central1"

  node_config {
    machine_type = "e2-standard-4"
    boot_disk_kms_key = google_kms_crypto_key.gke_node_disk_key.id
  }

  initial_node_count = 3
}

resource "google_kms_crypto_key" "gke_node_disk_key" {
  name            = "gke-node-disk-key"
  key_ring        = google_kms_key_ring.gke_keyring.id
  rotation_period = "7776000s" # 90 days
}

Tip: Define the KMS key, the IAM binding for the Compute service agent, and the node pool in the same Terraform module. That way a single terraform apply provisions a compliant pool, and there is no manual step to forget.

How to prevent it from happening again

Manual node pool creation is where this drifts. The durable fix is to make CMEK the only way a node pool can be created in your environment.

Org Policy constraint

GCP ships an org policy that requires CMEK for new Compute Engine resources, including GKE node disks. Apply it at the organization or folder level:

cat > cmek-policy.yaml <<'EOF'
name: organizations/ORG_ID/policies/gcp.restrictNonCmekServices
spec:
  rules:
    - values:
        deniedValues:
          - "container.googleapis.com"
EOF

gcloud org-policies set-policy cmek-policy.yaml

With this constraint in place, attempts to create a node pool without a CMEK key are rejected by the API, not just flagged after the fact.

Policy as code in CI/CD

If you use OPA or Conftest to gate Terraform plans, add a rule that fails any node pool missing a boot disk KMS key:

package gke.cmek

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "google_container_node_pool"
  not resource.change.after.node_config[0].boot_disk_kms_key
  msg := sprintf("node pool %q must set boot_disk_kms_key", [resource.address])
}

Wire that into the pull request pipeline so a non-compliant pool never reaches terraform apply.

Best practices

Use a dedicated key ring per environment. Keep prod, staging, and dev keys separate so a revocation or rotation in one environment cannot affect another.
Co-locate keys and resources. The KMS key location must match the region of the disks it protects. A regional cluster needs a regional key in the same region.
Enable automatic rotation. Set a rotation period (90 days is a common baseline) so you do not rely on someone remembering to rotate manually.
Lock down KMS IAM. Only the Compute service agent and your platform automation should have encrypter/decrypter rights. Audit those bindings regularly.
Pair CMEK with application-layer encryption for the most sensitive data. Disk-level CMEK protects data at rest, but secrets and PII often warrant an additional layer such as envelope encryption or GKE application-layer secrets encryption.
Monitor key usage. Cloud KMS emits audit logs for every encrypt and decrypt call. Stream them to your SIEM so unexpected key access surfaces quickly.

Note: CMEK for node disks is separate from application-layer secrets encryption, which encrypts Kubernetes Secrets in etcd. They solve different problems, and a well-secured cluster uses both. Lensix has a separate check for etcd secrets encryption.

Customer-managed keys turn disk encryption from a checkbox into a control you actually own. The migration takes a maintenance window, but once an org policy and a Terraform module enforce it, every future node pool is compliant by construction and this finding stops coming back.

GKE Node Disk Not Using CMEK: Take Control of Your Encryption Keys