Fix GKE IP Aliasing Not Enabled (VPC-Native)

TL;DR

This check flags GKE clusters that route Pod traffic without VPC-native IP aliasing, which means Pods rely on static routes instead of native VPC IP ranges. Recreate the cluster (or provision new ones) with --enable-ip-alias to get native VPC integration, better scaling, and tighter network controls.

VPC-native networking is the default for new GKE clusters today, but plenty of older clusters were built before that became standard. If a cluster was created with routes-based networking, every Pod IP range relies on Google Cloud routes inside your VPC rather than secondary IP ranges allocated from the subnet itself. That distinction sounds academic until you hit a routes quota, try to peer two VPCs, or attempt to apply granular firewall rules to Pod traffic and discover you cannot.

The gke_noaliasip check looks at each GKE cluster and reports any that do not have IP aliasing (alias IP ranges) enabled. This post explains what that means, why it is worth fixing, and how to migrate without painting yourself into a corner.

What this check detects

GKE supports two networking modes for assigning IP addresses to Pods:

Routes-based clusters — the older model. Pod IP ranges are reachable through custom static routes that GKE adds to your VPC. Each node gets a route entry pointing the node's Pod CIDR at that node.
VPC-native clusters — the current default. Pod and Service IPs come from secondary IP ranges (alias IP ranges) on the cluster's subnet. The VPC understands these IPs natively, no custom routes required.

The gke_noaliasip check fires when a cluster is using the routes-based model, meaning IP aliasing is off. In the GKE API this maps to the ipAllocationPolicy.useIpAliases field being false or unset.

Note: "IP aliasing," "alias IP ranges," and "VPC-native" all refer to the same underlying feature. Google's docs use these terms interchangeably depending on whether they are talking about Compute Engine networking or GKE specifically.

You can confirm the mode for a given cluster:

gcloud container clusters describe CLUSTER_NAME \
  --zone ZONE \
  --format="value(ipAllocationPolicy.useIpAliases)"

If that returns False or is empty, the cluster is routes-based and this check will flag it.

Why it matters

Routes-based networking is not broken, but it carries real constraints and security gaps that VPC-native networking solves.

1. Pod IPs are invisible to VPC firewall rules

With routes-based clusters, Pod traffic moves over static routes and firewall rules cannot reliably target Pod IP ranges the way they can with native subnet ranges. VPC-native clusters let you write firewall rules and use network policies against Pod IPs that the VPC actually knows about. If your threat model includes limiting lateral movement between workloads, that visibility matters.

2. Route quotas become a scaling ceiling

Each routes-based cluster consumes VPC route entries, and a VPC network has a finite route quota (1,000 dynamic plus static routes by default). A few large clusters in the same network can exhaust it. Once you hit the limit, you cannot add nodes or create new clusters until you raise the quota or free up routes. VPC-native clusters do not consume per-node routes at all.

Warning: Route quota exhaustion shows up at the worst possible time, usually during an autoscaling event under load, when new nodes silently fail to get connectivity. It is hard to diagnose because the cluster looks healthy until traffic spikes.

3. VPC Peering and Shared VPC limitations

Custom routes do not propagate across VPC Network Peering. If you peer the cluster's VPC with another network, Pod IPs in a routes-based cluster are unreachable from the peer. VPC-native clusters use subnet secondary ranges, which peering handles natively. This is a frequent blocker for teams trying to connect GKE workloads to managed services, on-prem networks, or other VPCs.

4. Required for modern GKE features

A growing list of features only work on VPC-native clusters, including Private Google Access for Pods, network endpoint groups (NEGs) for container-native load balancing, and certain Dataplane V2 capabilities. Staying on routes-based networking quietly locks you out of these.

Note: Container-native load balancing via NEGs sends traffic directly to Pods instead of bouncing through a node's kube-proxy. That removes a hop, improves load balancing accuracy, and gives you better health checks. It is VPC-native only.

How to fix it

Here is the hard truth: you cannot toggle IP aliasing on an existing cluster. VPC-native versus routes-based is set at creation time and is immutable. Fixing a flagged cluster means creating a new VPC-native cluster and migrating workloads to it.

Step 1: Reserve secondary IP ranges

VPC-native clusters need two secondary ranges on the subnet, one for Pods and one for Services. Size them generously. The Pod range in particular needs room for your maximum node count multiplied by Pods per node.

gcloud compute networks subnets update SUBNET_NAME \
  --region REGION \
  --add-secondary-ranges \
    pods-range=10.4.0.0/14,services-range=10.8.0.0/20

Warning: Secondary range sizing is permanent for the life of the cluster. A /14 Pod range supports far more nodes than a /20. Undersize it and you will hit Pod IP exhaustion, which again forces a cluster rebuild. Use Google's IP range planning guidance before picking sizes.

Step 2: Create the VPC-native cluster

gcloud container clusters create CLUSTER_NAME \
  --region REGION \
  --network VPC_NAME \
  --subnetwork SUBNET_NAME \
  --enable-ip-alias \
  --cluster-secondary-range-name pods-range \
  --services-secondary-range-name services-range

The --enable-ip-alias flag is what turns on VPC-native networking. On current gcloud versions this is the default for new clusters, but specifying it explicitly makes intent obvious and protects you against older defaults.

Step 3: Migrate workloads

With the new cluster running, move workloads over. The cleanest approach for most teams:

Apply your manifests (or Helm releases) to the new cluster.
Validate that Pods schedule, Services resolve, and external connectivity works.
Shift traffic gradually using DNS weighting or a load balancer in front of both clusters.
Drain traffic from the old cluster and confirm the new one handles full load.

Danger: Deleting the old cluster is irreversible and takes its workloads, attached Persistent Disks, and load balancer IPs with it. Confirm the new cluster is serving production traffic and your stateful data is migrated or backed up before running the command below.

gcloud container clusters delete OLD_CLUSTER_NAME \
  --region REGION

Terraform example for the replacement cluster

If you manage GKE with Terraform, bake VPC-native networking into the resource so rebuilt clusters are correct by default:

resource "google_container_cluster" "primary" {
  name     = "prod-cluster"
  location = "us-central1"
  network    = google_compute_network.vpc.id
  subnetwork = google_compute_subnetwork.subnet.id

  # Enables VPC-native (alias IP) networking
  networking_mode = "VPC_NATIVE"

  ip_allocation_policy {
    cluster_secondary_range_name  = "pods-range"
    services_secondary_range_name = "services-range"
  }

  remove_default_node_pool = true
  initial_node_count       = 1
}

Setting networking_mode = "VPC_NATIVE" together with the ip_allocation_policy block is the Terraform equivalent of --enable-ip-alias.

How to prevent it from happening again

Because the fix is a full cluster rebuild, prevention is far cheaper than remediation. Stop routes-based clusters from being created in the first place.

Enforce with Organization Policy

GCP has no built-in org policy constraint that says "VPC-native only," so the practical enforcement layer is policy-as-code on your IaC. If you provision clusters through Terraform, add a check with a tool like OPA Conftest or Checkov:

package gke.networking

deny[msg] {
  resource := input.resource.google_container_cluster[name]
  resource.networking_mode != "VPC_NATIVE"
  msg := sprintf("Cluster '%s' must use VPC_NATIVE networking mode", [name])
}

Gate it in CI/CD

Run that policy on every pull request that touches cluster definitions, and fail the build if it does not pass:

conftest test --policy ./policies cluster.tf

Tip: Pair the CI gate with continuous detection in Lensix. The CI check catches new clusters before they deploy; the gke_noaliasip check catches any cluster created outside your pipeline (console clicks, scripts, legacy infra) so nothing slips through.

Audit existing clusters

Sweep every project for routes-based clusters so you know the full scope of any migration work:

for cluster in $(gcloud container clusters list --format="value(name,zone)" | tr '\t' ','); do
  name=$(echo $cluster | cut -d',' -f1)
  zone=$(echo $cluster | cut -d',' -f2)
  alias=$(gcloud container clusters describe $name --zone $zone \
    --format="value(ipAllocationPolicy.useIpAliases)")
  echo "$name ($zone): useIpAliases=$alias"
done

Best practices

Always create VPC-native clusters. It is the default, it is required for modern features, and there is no good reason to opt out for new workloads.
Plan IP ranges before you build. Secondary range sizes are fixed for the cluster's life. Model your peak node and Pod counts and add headroom.
Use non-overlapping ranges across clusters and peers. If you ever expect to peer VPCs or connect to on-prem, allocate Pod and Service ranges that will not collide with other networks.
Combine VPC-native with network policies. Native Pod IPs only deliver tighter security if you actually restrict traffic. Enable network policy enforcement (or Dataplane V2) and write policies that limit Pod-to-Pod communication.
Treat cluster networking config as immutable infrastructure. Since you cannot change it in place, manage it in version control and recreate rather than patch.

Routes-based clusters are a relic of an earlier GKE. Migrating off them is a one-time cost that removes a scaling ceiling, unlocks better load balancing, and gives your network team the firewall and peering control they have probably been asking for.

Fix the clusters this check flags, lock in VPC-native as the only allowed mode through your pipeline, and let continuous monitoring confirm it stays that way.

GKE IP Aliasing Not Enabled: Fixing Routes-Based Clusters