Back to blog
Cloud SecurityCompute & ContainersGCPIdentity & AccessKubernetes

GKE Workload Metadata Not Hardened: Stop Pods From Stealing Node Credentials

Learn why unhardened GKE node pool metadata lets pods steal node service account tokens, and how to fix it with Workload Identity and GKE_METADATA.

TL;DR

This check flags GKE node pools where workload metadata is not hardened, meaning pods can reach the GCE metadata server and steal the node's service account credentials. Fix it by enabling GKE Metadata Server (Workload Identity) on every node pool with --workload-metadata=GKE_METADATA.

On Google Kubernetes Engine, every node is a Compute Engine VM, and every VM has access to the GCE metadata server at 169.254.169.254. That endpoint hands out the node's service account token to anything that asks. By default, pods running on that node can reach the same endpoint, which means a single compromised container can impersonate the node and inherit whatever IAM permissions the node service account holds.

The GKE Workload Metadata Not Hardened check looks for node pools that have not locked this down. If a pool is still exposing the raw metadata server to workloads, this check fails.


What this check detects

The check inspects the workloadMetadataConfig setting on each GKE node pool. That setting controls how pods on the node interact with the instance metadata server. There are two relevant modes:

  • GCE_METADATA (or unset on older clusters) - pods can directly query the GCE metadata server and read the node service account token, SSH keys, and other instance metadata.
  • GKE_METADATA - the GKE Metadata Server intercepts metadata requests, blocks sensitive paths, and routes service account requests through Workload Identity so each pod only gets the identity it was assigned.

The check fails when a node pool is configured with GCE_METADATA or has no hardened metadata configuration at all. It passes when the pool uses GKE_METADATA, which is the mode enabled by Workload Identity.

Note: GKE Metadata Server and Workload Identity are tied together. Enabling Workload Identity on the cluster sets up the identity federation, but each node pool must also be told to run the GKE Metadata Server. A cluster can have Workload Identity enabled while individual pools still expose raw metadata.


Why it matters

The node service account is one of the most over-privileged identities in a typical GKE setup. Plenty of clusters still run with the default Compute Engine service account, which carries broad project-level permissions. When pods can read that account's token, container security stops at the perimeter of a single workload.

Here is the attack chain this check is designed to break:

  1. An attacker exploits a vulnerability in an application running in a pod, for example an SSRF bug or remote code execution in a web service.
  2. From inside the pod, they curl the metadata server: curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token
  3. They receive an OAuth token for the node service account.
  4. They use that token to enumerate and access GCS buckets, Pub/Sub topics, other GKE clusters, or anything else the node account can touch.
  5. If the node account has roles/editor (the default), they now effectively own the project.

SSRF is the painful one here. An attacker does not even need code execution. A server-side request forgery bug that lets them control an outbound URL is enough to fetch the token, because the request originates from inside the trusted network. This is the same class of flaw behind the 2019 Capital One breach on AWS, where IMDSv1 metadata was reachable through an SSRF.

Warning: Even if you assigned a tightly scoped service account to your workload via Workload Identity, an unhardened node pool still leaks the node service account. Workload Identity only helps when the GKE Metadata Server is actually intercepting requests, which is exactly what this check verifies.


How to fix it

The fix is to enable the GKE Metadata Server on the node pool, which requires Workload Identity at the cluster level. Do the cluster setting first, then update each pool.

Step 1: Enable Workload Identity on the cluster

gcloud container clusters update CLUSTER_NAME \
  --location=REGION \
  --workload-pool=PROJECT_ID.svc.id.goog

Step 2: Harden each node pool

Update existing pools to use the GKE Metadata Server:

gcloud container node-pools update NODE_POOL_NAME \
  --cluster=CLUSTER_NAME \
  --location=REGION \
  --workload-metadata=GKE_METADATA

Warning: Updating workload metadata on an existing node pool triggers a rolling recreation of the nodes. Plan for the disruption, make sure your PodDisruptionBudgets are sane, and run it during a maintenance window for stateful or latency-sensitive workloads.

Step 3: Create new pools hardened from the start

gcloud container node-pools create NODE_POOL_NAME \
  --cluster=CLUSTER_NAME \
  --location=REGION \
  --workload-metadata=GKE_METADATA

Verify the change

gcloud container node-pools describe NODE_POOL_NAME \
  --cluster=CLUSTER_NAME \
  --location=REGION \
  --format="value(config.workloadMetadataConfig.mode)"

You want this to return GKE_METADATA. To prove the exposure is gone, exec into a pod and try the metadata call:

kubectl run mdtest --rm -it --image=curlimages/curl --restart=Never -- \
  curl -s -H "Metadata-Flavor: Google" \
  "http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token"

On a hardened pool, the sensitive paths return errors or empty responses instead of a live token.

Danger: Switching a pool to GKE_METADATA changes how workloads obtain credentials. Any pod that relied on the node service account through the raw metadata server will lose access unless you bind a Kubernetes service account to a Google service account via Workload Identity first. Map your identities before you flip the switch, or you will break running applications.

Bind workloads to their own identities

After hardening, give each workload its own scoped identity instead of leaning on the node account:

# Create a Google service account for the workload
gcloud iam service-accounts create my-app-gsa

# Grant only what the app needs
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:my-app-gsa@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/storage.objectViewer"

# Allow the KSA to impersonate the GSA
gcloud iam service-accounts add-iam-policy-binding \
  my-app-gsa@PROJECT_ID.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE/KSA_NAME]"

Then annotate the Kubernetes service account:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-app-ksa
  namespace: my-namespace
  annotations:
    iam.gke.io/gcp-service-account: my-app-gsa@PROJECT_ID.iam.gserviceaccount.com

Fix it as code

If you manage clusters with Terraform, set both the cluster-level Workload Identity config and the node pool metadata mode so the hardened state is the default for anything you provision.

resource "google_container_cluster" "primary" {
  name     = "prod-cluster"
  location = "us-central1"

  workload_identity_config {
    workload_pool = "${var.project_id}.svc.id.goog"
  }
}

resource "google_container_node_pool" "primary_nodes" {
  name     = "primary"
  cluster  = google_container_cluster.primary.id
  location = "us-central1"

  node_config {
    workload_metadata_config {
      mode = "GKE_METADATA"
    }
  }
}

Tip: Make workload_metadata_config { mode = "GKE_METADATA" } part of your shared GKE module so every team that spins up a cluster gets the hardened default without having to remember the flag.


How to prevent it from happening again

Hardening one cluster is a Tuesday afternoon. Keeping every cluster hardened across teams and regions is the real work. Push the control left so unhardened pools never reach production.

Policy as code with OPA or Gatekeeper

If you provision through Terraform, add a Conftest or OPA policy that rejects any node pool without GKE_METADATA:

package main

deny[msg] {
  resource := input.resource.google_container_node_pool[name]
  mode := resource.node_config[_].workload_metadata_config[_].mode
  mode != "GKE_METADATA"
  msg := sprintf("node pool %q must set workload_metadata_config.mode = GKE_METADATA", [name])
}

Organization policy constraint

Enforce Workload Identity at the org or folder level so new clusters cannot be created without it:

gcloud resource-manager org-policies enable-enforce \
  constraints/container.requireWorkloadIdentity \
  --organization=ORG_ID

Continuous scanning

Run the Lensix gke_workloadmetadata check on a schedule so any drift, like a manually created pool or a team that disabled the setting, surfaces fast instead of sitting unnoticed until an incident.

Tip: Wire the check into your CI pipeline against plan output before terraform apply. Catching a misconfigured pool in a pull request costs minutes. Catching it after a breach costs a lot more.


Best practices

  • Enable GKE Metadata Server on every node pool. Do not treat it as optional for "internal" clusters. SSRF and RCE do not care whether a cluster faces the internet.
  • Stop using the default Compute Engine service account. Create a minimal custom service account for nodes and strip it to logging and monitoring permissions only.
  • Give each workload its own identity through Workload Identity. Scope IAM roles per application so a compromise in one pod does not hand over the whole project.
  • Audit node service account roles. Run gcloud projects get-iam-policy and confirm no node account holds roles/editor or roles/owner.
  • Use Autopilot where it fits. GKE Autopilot clusters enforce Workload Identity and hardened metadata by default, removing a whole class of these misconfigurations.
  • Test the exposure, do not just trust the flag. Periodically run the metadata curl from inside a pod to confirm the token endpoint is actually blocked.

The metadata server is a small detail with an outsized blast radius. Hardening it converts a single application bug from a project-wide compromise into a contained incident, which is exactly the kind of boundary you want between an attacker and your cloud.