Fix GKE Default Service Account on Node Pools

TL;DR

This check flags GKE node pools running on the default Compute Engine service account, which carries broad project-wide permissions every pod on the node inherits. Create a dedicated, least-privilege service account and rebuild the node pool to use it.

When you spin up a GKE cluster without specifying a service account, Google quietly attaches the default Compute Engine service account to your nodes. It works, your workloads run, and nothing breaks. That is exactly the problem. The default service account is wired into your project with far more access than any node pool actually needs, and every workload scheduled onto those nodes can borrow that identity.

This Lensix check, gke_defaultserviceaccount, looks at each node pool in your GKE clusters and reports any that are attached to the default Compute Engine service account (the one that looks like [email protected]).

What this check detects

Every GKE node runs as a Google Cloud service account. That identity is what the node, and by extension the kubelet and any pod using node-level credentials, presents when it calls Google Cloud APIs. The check inspects the serviceAccount field on each node pool's node config. If that value is default or resolves to the project's default Compute Engine service account, the node pool fails.

Note: The default Compute Engine service account is created automatically with every project that enables Compute. By default it is granted the primitive roles/editor role on the entire project. That single role can read and modify almost every resource in the project.

So a node pool on the default service account is not just over-permissioned in theory. It holds an Editor grant across your whole GCP project, available to anything that can reach the node metadata endpoint.

Why it matters

The risk here is lateral movement and privilege escalation. GKE nodes expose a metadata server that hands out access tokens for the attached service account. Any process running on a node can query it:

curl -s -H "Metadata-Flavor: Google" \
  "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token"

That returns a live OAuth token scoped to the node's service account. If that account is the default one with roles/editor, the token can create VMs, read Cloud Storage buckets, modify firewall rules, and pull secrets from Secret Manager across the project.

Now connect that to a realistic attack chain:

An attacker compromises a single pod, through a vulnerable web app, a poisoned dependency, or an SSRF bug.
From inside that pod, they reach the node metadata endpoint and pull the service account token.
With Editor on the project, they pivot from a contained container breakout to full project compromise.

What should have been a small blast radius, one pod, becomes the entire project. This is one of the most common escalation paths in real GKE incidents, and it stems almost entirely from leaving the default service account in place.

Warning: GKE Workload Identity mitigates a lot of this, but it does not fully remove the node service account. The node still needs an identity to pull images, write logs, and report metrics. If that identity is the over-privileged default account, you still have a problem even with Workload Identity enabled.

How to fix it

You cannot change the service account on an existing node pool in place. The fix is to create a dedicated least-privilege service account and provision a new node pool that uses it, then migrate workloads and delete the old pool.

Step 1: Create a dedicated service account

gcloud iam service-accounts create gke-node-sa \
  --display-name="GKE Node Pool Service Account" \
  --project=my-project

Step 2: Grant only the roles nodes actually need

Google publishes a minimal set of roles for GKE nodes. These cover logging, monitoring, metadata, and pulling images from Artifact Registry.

SA="[email protected]"

for ROLE in \
  roles/logging.logWriter \
  roles/monitoring.metricWriter \
  roles/monitoring.viewer \
  roles/stackdriver.resourceMetadata.writer \
  roles/artifactregistry.reader
do
  gcloud projects add-iam-policy-binding my-project \
    --member="serviceAccount:$SA" \
    --role="$ROLE"
done

Note: If you still pull images from Container Registry (GCR) backed by Cloud Storage instead of Artifact Registry, swap roles/artifactregistry.reader for roles/storage.objectViewer. New projects should prefer Artifact Registry.

Step 3: Create a new node pool with the dedicated account

gcloud container node-pools create secure-pool \
  --cluster=my-cluster \
  --service-account="[email protected]" \
  --machine-type=e2-standard-4 \
  --num-nodes=3 \
  --workload-metadata=GKE_METADATA \
  --region=us-central1

The --workload-metadata=GKE_METADATA flag enables Workload Identity on the pool, which blocks pods from reaching the raw node service account token through the metadata server. Combining a least-privilege node account with Workload Identity gives you defense in depth.

Step 4: Migrate workloads and remove the old pool

Cordon and drain the old nodes so the scheduler moves pods onto the new pool, then delete the old pool.

# Cordon every node in the old pool
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool -o name); do
  kubectl cordon "$node"
done

# Drain them one at a time
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool -o name); do
  kubectl drain "$node" --ignore-daemonsets --delete-emptydir-data
done

Danger: Deleting a node pool terminates its nodes immediately. Confirm every workload has rescheduled onto the new pool and is healthy before running the command below. Draining first is what makes this safe.

gcloud container node-pools delete default-pool \
  --cluster=my-cluster \
  --region=us-central1

Doing it in Terraform

If you manage GKE with Terraform, bake the dedicated account into the node pool definition so the problem never reappears on rebuild:

resource "google_service_account" "gke_nodes" {
  account_id   = "gke-node-sa"
  display_name = "GKE Node Pool Service Account"
}

resource "google_container_node_pool" "secure_pool" {
  name     = "secure-pool"
  cluster  = google_container_cluster.primary.id
  location = "us-central1"

  node_config {
    machine_type    = "e2-standard-4"
    service_account = google_service_account.gke_nodes.email
    oauth_scopes    = ["https://www.googleapis.com/auth/cloud-platform"]

    workload_metadata_config {
      mode = "GKE_METADATA"
    }
  }
}

Tip: Use cloud-platform as the OAuth scope and control access through IAM roles on the service account, not through narrow legacy scopes. Scopes are a coarse, hard-to-audit control. IAM roles are precise and visible in your policy.

How to prevent it from happening again

Manual fixes drift. The only durable fix is to stop default-service-account node pools from being created in the first place.

Enforce with Organization Policy

GCP has a built-in constraint that blocks resources from using the default service account at creation time:

gcloud resource-manager org-policies enable-enforce \
  iam.automaticIamGrantsForDefaultServiceAccounts \
  --organization=YOUR_ORG_ID

Disabling automatic IAM grants stops new default service accounts from getting Editor at all, which shrinks the blast radius even if one slips through.

Gate Terraform in CI/CD

Add a policy-as-code check that fails the pipeline when a node pool config omits a custom service account. With Conftest and OPA:

package gke

deny[msg] {
  rc := input.resource_changes[_]
  rc.type == "google_container_node_pool"
  sa := rc.change.after.node_config[0].service_account
  not sa
  msg := sprintf("Node pool '%s' must set an explicit service_account", [rc.address])
}

deny[msg] {
  rc := input.resource_changes[_]
  rc.type == "google_container_node_pool"
  sa := rc.change.after.node_config[0].service_account
  sa == "default"
  msg := sprintf("Node pool '%s' uses the default service account", [rc.address])
}

terraform plan -out tfplan
terraform show -json tfplan > plan.json
conftest test plan.json --policy gke-policies/

Tip: Run the Lensix gke_defaultserviceaccount check on a schedule against live clusters too. CI gates catch new infrastructure, but they miss clusters created before the gate existed or through the console. Continuous scanning closes that gap.

Best practices

One service account per node pool purpose. Separate accounts for system pools, batch pools, and ingress pools keep permissions tight and make audit logs readable.
Always pair custom node accounts with Workload Identity. Give each Kubernetes service account its own GCP identity so pods never depend on node-level credentials.
Never grant roles/editor or roles/owner to a node account. Start from Google's minimal role set and add specific roles only when a workload genuinely needs them.
Block the metadata server from pods that do not need it using GKE_METADATA mode or network policy, so a single compromised pod cannot read node tokens.
Review IAM bindings on service accounts regularly. Permissions accrete over time. Schedule a recurring review and trim anything unused.

Switching off the default service account is a small change with an outsized payoff. It turns a project-wide compromise into a contained incident, and it costs nothing but the time to rebuild a node pool. Make it the default for every cluster you run, then enforce it so you never have to think about it again.

GKE Node Pool Uses Default Service Account: Why It's Risky and How to Fix It