Fix AKS Clusters With No Network Policy | Lensix

TL;DR

By default, every pod in an AKS cluster can talk to every other pod with no restrictions. This check flags clusters that have no network policy engine configured, which means a single compromised pod can move laterally across your whole workload. Enable a network policy engine (Azure or Cilium) at cluster creation and write default-deny policies per namespace.

Kubernetes networking is flat by design. Any pod can reach any other pod on any port, across namespaces, with nothing in the way. That is convenient for getting workloads talking quickly, and it is a liability the moment one of those workloads gets popped. This check, aks_nonetworkpolicy, looks at your Azure Kubernetes Service clusters and tells you which ones have no network policy engine enabled at all.

If you have never thought about east-west traffic inside your cluster, this is the post to read.

What this check detects

The check inspects the network profile of each AKS cluster and reports a failure when networkPolicy is set to none or is unset. In practice that covers three situations:

The cluster was created without specifying a network policy and Azure defaulted to none.
The cluster uses kubenet or Azure CNI but the policy engine was never turned on.
Someone created the cluster through the portal and clicked past the networking tab.

When no policy engine is present, the Kubernetes API will happily accept NetworkPolicy objects, but nothing enforces them. They become silent no-ops. That is worse than having no policies at all, because teams assume they are protected when they are not.

Note: A NetworkPolicy resource in Kubernetes is just a declaration of intent. It only does anything if a CNI plugin with a policy controller is running. AKS supports three options: Azure Network Policy Manager, Calico, and the newer Cilium-based Azure CNI Powered by Cilium.

Why it matters

The flat network is the single biggest gift you can give an attacker who has landed a foothold in your cluster. Consider how a real intrusion plays out.

An attacker exploits an SSRF bug or an unpatched dependency in a public-facing web pod. They now have code execution inside that pod. In a cluster with no network policy, they can immediately:

Scan the entire pod CIDR and find every other workload, including internal admin tools and databases.
Reach the metrics, monitoring, and logging pods that often hold credentials.
Hit the IMDS endpoint and pivot to Azure identities if the kubelet identity is overprivileged.
Talk directly to a database pod in another namespace that should never have been reachable from the web tier.

None of that requires a second exploit. The lateral movement is free because the network does not say no. Network policies turn that flat plane into segmented zones, so a compromised frontend can only reach the specific services it actually needs.

Network segmentation does not stop the initial breach. It contains the blast radius so one compromised pod does not become a compromised cluster.

There is a compliance angle too. PCI DSS, SOC 2, and most internal controls expect network segmentation between workloads of differing trust levels. An auditor who finds a payment service sharing an unrestricted network with a marketing site is going to write that up.

How to fix it

The honest answer is that the network policy engine is chosen at cluster creation time and cannot always be changed in place. Your remediation path depends on what you have today.

Option 1: Enable a policy engine on an existing cluster

If your cluster runs Azure CNI, you can switch to the Cilium-based network policy engine on a running cluster with a one-line update. This is the smoothest path on modern AKS.

az aks update \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --network-policy cilium \
  --network-dataplane cilium

Warning: Switching the data plane causes a rolling update of every node in the cluster. Plan for node cycling, make sure your pod disruption budgets are sane, and run it during a maintenance window. Some legacy clusters on kubenet cannot be upgraded in place and require a rebuild.

Option 2: Create a new cluster with a policy engine

For new clusters, or for kubenet clusters that need rebuilding, set the engine at creation. Azure CNI Powered by Cilium is the recommended default for new work.

az aks create \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --network-plugin azure \
  --network-dataplane cilium \
  --network-policy cilium \
  --generate-ssh-keys

If you prefer the Azure Network Policy Manager or Calico instead of Cilium:

# Azure Network Policy Manager
az aks create \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --network-plugin azure \
  --network-policy azure \
  --generate-ssh-keys

# Calico
az aks create \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --network-plugin azure \
  --network-policy calico \
  --generate-ssh-keys

Step 3: Write a default-deny baseline

Turning on the engine does nothing until you write policies. The first policy in every namespace should deny all ingress and egress, then you open up only what is needed. Start with this in each application namespace:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: payments
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

An empty podSelector matches every pod in the namespace, and the absence of any ingress or egress rules means everything is denied. Now add back the specific flows. For example, allow the frontend to reach the API on port 8080:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-api
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: frontend
      ports:
        - protocol: TCP
          port: 8080

Danger: Do not apply a default-deny egress policy to production namespaces without first allowing DNS to kube-system. If you forget, every pod loses name resolution and your services break in confusing ways. Always pair default-deny egress with an explicit allow for UDP and TCP port 53 to the kube-dns pods.

The DNS allow rule looks like this:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns-egress
  namespace: payments
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53

Step 4: Verify enforcement

Confirm the engine is reporting on the cluster and that a test pod cannot reach something it should not:

# Check the configured policy engine
az aks show \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --query "networkProfile.networkPolicy" -o tsv

# Spin up a test pod and try a connection that should be blocked
kubectl run nettest --image=nicolaka/netshoot -n payments --rm -it -- \
  curl -m 5 http://some-other-service.default.svc.cluster.local

If the policy is working, that curl should time out.

How to prevent it from happening again

Fixing one cluster by hand is fine once. The goal is to make a cluster without a policy engine impossible to ship.

Define it in your IaC

Most clusters in the wild were created through the portal or an ad-hoc CLI command. Move cluster creation into Terraform or Bicep so the network policy is reviewed in a pull request. In Terraform:

resource "azurerm_kubernetes_cluster" "main" {
  name                = "myAKSCluster"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  dns_prefix          = "myaks"

  default_node_pool {
    name       = "default"
    node_count = 3
    vm_size    = "Standard_D2s_v5"
  }

  network_profile {
    network_plugin      = "azure"
    network_data_plane  = "cilium"
    network_policy      = "cilium"
  }

  identity {
    type = "SystemAssigned"
  }
}

Tip: Bundle the default-deny and DNS-allow policies as a Helm chart or Kustomize base that every namespace inherits. New teams then start secure by default instead of bolting policies on later when they remember.

Gate it in CI/CD

Add a policy-as-code check that rejects any Terraform plan or AKS template where network_policy is missing or set to none. With Conftest and Rego:

package aks

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "azurerm_kubernetes_cluster"
  np := resource.change.after.network_profile[_].network_policy
  np == ""
  msg := sprintf("AKS cluster '%s' has no network policy engine", [resource.name])
}

Run it in the pipeline before apply:

terraform plan -out=tfplan
terraform show -json tfplan > plan.json
conftest test plan.json --policy ./policies

Enforce with Azure Policy

For organization-wide coverage, assign the built-in Azure Policy that audits AKS clusters lacking a network policy engine. It catches clusters created outside your pipelines and surfaces them in the compliance dashboard. Combined with Lensix continuous scanning, you get drift detection rather than a one-time snapshot.

Best practices

Default deny everywhere. Every namespace should start with a default-deny ingress and egress policy, then open specific flows. Allowlists beat denylists.
Always allow DNS first. The most common cause of broken policy rollouts is forgetting kube-dns egress. Make the DNS allow rule part of your namespace template.
Segment by namespace and label. Use namespace and pod selectors that match your actual trust boundaries, not arbitrary ones. Payment workloads, internal tooling, and public frontends should be separated.
Prefer Cilium for new clusters. It supports the standard Kubernetes NetworkPolicy plus richer L7 and identity-aware rules, and it is the direction AKS is investing in.
Test policies before prod. Roll out default-deny in a staging namespace and watch for broken connections before applying it to revenue-critical workloads.
Pair network policy with the rest of the stack. Network policy is one control. Combine it with pod security standards, least-privilege workload identities, and private API server access for defense in depth.

A network policy engine costs nothing to enable and turns your flat cluster network into something an attacker actually has to fight through. Turn it on, write a default-deny baseline, and gate it in CI so the next cluster is born secure.

AKS Cluster Has No Network Policy: Why It Matters and How to Fix It