Back to blog
Best PracticesCloud SecurityCompute & ContainersGCPKubernetes

GKE Nodes Are Not Private: Why Public Worker Nodes Are a Risk and How to Fix It

Learn why GKE clusters with public worker nodes expand your attack surface, and how to enable private nodes with gcloud, Terraform, and policy-as-code.

TL;DR

This check flags GKE clusters whose worker nodes have public IP addresses, exposing them directly to the internet. Private nodes remove that exposure by giving nodes internal-only IPs. The fix is to recreate the cluster (or add a private node pool) with --enable-private-nodes.

When you spin up a GKE cluster with the defaults, the worker nodes get external IP addresses. That sounds harmless until you realize every node in the cluster is now reachable from the public internet, and every kubelet, container runtime, and exposed NodePort becomes part of your attack surface. The GKE Nodes Are Not Private check (gke_noprivatecluster) catches clusters that skipped private nodes and left their compute exposed.

This post walks through what the check looks for, why public nodes are a real risk and not just a compliance nag, and how to move to private nodes without breaking your workloads.


What this check detects

The check inspects each GKE cluster and reports a failure when private nodes are not enabled. In a private cluster, nodes are assigned internal RFC 1918 addresses only and have no public IP. They reach the internet for things like pulling images through a NAT gateway or Cloud NAT, rather than directly.

Concretely, the check is looking at the cluster's privateClusterConfig.enablePrivateNodes field. When that flag is false or absent, the cluster fails.

Note: Private nodes and a private endpoint are two separate settings. Private nodes control whether your worker nodes have public IPs. The private endpoint controls whether the Kubernetes control plane is reachable publicly. This check focuses on the nodes. You can have private nodes while still keeping a public control plane endpoint, which is a common setup.


Why it matters

Public node IPs widen your attack surface in ways that are easy to underestimate. Here is what actually goes wrong.

Direct reachability of node services

The kubelet listens on ports 10250 and 10255. A misconfigured firewall rule, an overly broad 0.0.0.0/0 ingress, or a Service of type NodePort can suddenly make node-level ports reachable from anywhere. With public IPs in play, an attacker who finds an open port is talking directly to your compute, not to a load balancer you control.

NodePort and host networking exposure

NodePort services bind a port (default 30000 to 32767) on every node. If a node has a public IP and a firewall rule permits the traffic, that NodePort is internet-facing whether you intended it or not. Pods running with hostNetwork: true inherit the same exposure.

Lateral movement after a pod compromise

If an attacker breaks out of a container or exploits a vulnerable app, a node with a public IP gives them an easier path to exfiltrate data and reach external command-and-control infrastructure. Private nodes force that egress through Cloud NAT, where you can log, restrict, and monitor it.

Warning: A common false sense of security is "we have firewall rules locking it down." Firewall rules drift. Someone adds a temporary allow-all rule to debug an issue and forgets to remove it. Private nodes remove the public IP entirely, so a firewall mistake cannot expose what does not exist.

Compliance pressure

CIS GKE Benchmark recommendation 6.6.5 and most internal hardening standards expect private nodes. If you are working toward SOC 2, PCI DSS, or FedRAMP, public worker nodes are the kind of finding that shows up in every audit cycle.


How to fix it

The catch with GKE is that you cannot toggle private nodes on an existing cluster's default settings. Whether a cluster uses private nodes is set at creation time. So remediation falls into two paths: create a new private cluster, or migrate workloads to a new private node pool.

Option 1: Create a new private cluster (gcloud)

For new clusters, enable private nodes up front. You need to allocate a /28 range for the control plane.

gcloud container clusters create prod-cluster \
  --region us-central1 \
  --enable-private-nodes \
  --enable-ip-alias \
  --master-ipv4-cidr 172.16.0.0/28 \
  --network my-vpc \
  --subnetwork my-subnet \
  --enable-master-authorized-networks \
  --master-authorized-networks 203.0.113.0/24

A few flags worth calling out:

  • --enable-private-nodes is the setting this check cares about. Nodes get internal IPs only.
  • --enable-ip-alias is required for private clusters and gives you VPC-native networking.
  • --master-ipv4-cidr is the dedicated /28 for the control plane peering.
  • --enable-master-authorized-networks restricts who can reach the control plane endpoint. Combine this with private nodes for defense in depth.

Warning: Private nodes have no route to the internet by default. If your pods pull images from Docker Hub, hit external APIs, or download packages at build time, you must set up Cloud NAT first or those workloads will fail to start. Plan this before you cut over.

Set up Cloud NAT so private nodes can reach the internet for egress:

gcloud compute routers create nat-router \
  --network my-vpc \
  --region us-central1

gcloud compute routers nats create nat-config \
  --router nat-router \
  --region us-central1 \
  --nat-all-subnet-ip-ranges \
  --auto-allocate-nat-external-ips

Option 2: Add a private node pool and migrate

If the cluster itself was created with private cluster config but a node pool still has public IPs, or you want to migrate workloads gradually, add a private node pool and drain the old one.

gcloud container node-pools create private-pool \
  --cluster prod-cluster \
  --region us-central1 \
  --enable-private-nodes \
  --num-nodes 3

Then cordon and drain the old nodes so pods reschedule onto the private pool:

Danger: Draining nodes evicts running pods. On a production cluster, do this during a maintenance window and confirm PodDisruptionBudgets are in place so you do not take down more replicas than your service can tolerate. Drain one node at a time.

# List nodes in the old public pool
kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool

# Cordon and drain each one
kubectl cordon 
kubectl drain  --ignore-daemonsets --delete-emptydir-data

# Once empty, delete the old pool
gcloud container node-pools delete default-pool \
  --cluster prod-cluster \
  --region us-central1

Terraform example

If you manage GKE with Terraform, set the private cluster block at creation:

resource "google_container_cluster" "prod" {
  name     = "prod-cluster"
  location = "us-central1"
  network  = google_compute_network.vpc.id
  subnetwork = google_compute_subnetwork.subnet.id

  networking_mode = "VPC_NATIVE"
  ip_allocation_policy {}

  private_cluster_config {
    enable_private_nodes    = true
    enable_private_endpoint = false
    master_ipv4_cidr_block  = "172.16.0.0/28"
  }

  master_authorized_networks_config {
    cidr_blocks {
      cidr_block   = "203.0.113.0/24"
      display_name = "office"
    }
  }
}

Tip: The community terraform-google-modules/kubernetes-engine module has a safer-cluster variant that turns on private nodes, shielded nodes, Workload Identity, and network policy by default. Adopting it gives you a hardened baseline without hand-wiring every flag.


How to prevent it from happening again

Fixing one cluster is fine. Stopping the next public cluster from being created is what actually moves the needle.

Enforce with Organization Policy

GCP ships a built-in constraint that blocks creation of clusters without private nodes:

gcloud resource-manager org-policies enable-enforce \
  container.restrictNoncompliantDiagnosticDataAccess \
  --organization YOUR_ORG_ID

More directly, use constraints/container.privateClusterRequired if available in your org, or enforce it through Policy Controller below.

Policy-as-code with Gatekeeper / Policy Controller

For teams running GKE, Anthos Policy Controller (built on OPA Gatekeeper) can reject non-compliant cluster configs at the IaC layer. Pair it with Config Sync to keep the policy enforced across your fleet.

If you provision with Terraform, add a conftest check in CI that rejects plans where enable_private_nodes is false:

package gke

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "google_container_cluster"
  not resource.change.after.private_cluster_config[_].enable_private_nodes
  msg := sprintf("cluster '%s' must use private nodes", [resource.name])
}

Gate it in CI/CD

Run the policy check on every pull request that touches infrastructure. A failed check blocks the merge, so a public cluster never reaches production in the first place.

Tip: Lensix runs the gke_noprivatecluster check continuously across your GCP projects, so even clusters created outside your IaC pipeline (the click-ops ones) get flagged. Wire the findings to Slack or your ticketing system to catch drift the moment it appears.


Best practices

  • Pair private nodes with authorized networks. Private nodes harden the data plane. Master authorized networks harden access to the control plane. Use both.
  • Use Cloud NAT, not public IPs, for egress. Centralize and log all outbound traffic so you can audit what your workloads talk to.
  • Turn on Shielded GKE Nodes. Add --shielded-secure-boot and integrity monitoring to protect against boot-level and kernel-level tampering.
  • Adopt Workload Identity. Stop binding node service accounts to broad IAM roles. Scope credentials to the pod that needs them.
  • Restrict NodePort usage. Prefer internal load balancers and Ingress over NodePort services, and use network policy to limit pod-to-pod traffic.
  • Plan migrations carefully. Because private nodes are set at creation, treat clusters as immutable infrastructure. Building a new private cluster and shifting traffic is cleaner than fighting an existing public one.

Private nodes are one of the cheapest security wins in GKE. They cost nothing extra (Cloud NAT has a small charge), they remove a whole class of exposure, and they satisfy nearly every compliance benchmark. The only real friction is remembering to set the flag before the cluster exists, which is exactly why enforcing it in policy matters more than fixing it after the fact.