Back to blog
AzureBest PracticesCloud SecurityIdentity & AccessKubernetes

AKS Cluster Has No Managed Identity: Why It Matters and How to Fix It

Learn why AKS clusters without a managed identity rely on risky service principal secrets, plus step-by-step CLI, Terraform, and Azure Policy fixes.

TL;DR

This check flags AKS clusters that run without a managed identity, forcing them to fall back on service principals with manually managed credentials. The fix is to enable a system-assigned or user-assigned managed identity so Azure handles credential rotation for you.

When you spin up an AKS cluster, it needs an identity to talk to other Azure resources: load balancers, disks, virtual networks, Key Vault, and more. There are two ways to give it that identity. The old way uses a service principal with a client ID and secret that you create, store, and rotate yourself. The modern way uses a managed identity, where Azure provisions and rotates the credential automatically.

The aks_nomanagedidentity check fires when your cluster is still relying on a service principal instead of a managed identity. It is one of the most common findings on older Azure environments, because clusters created before managed identity support became the default were provisioned with service principals and never migrated.


What this check detects

The check inspects the identity configuration of each AKS cluster in scope. Specifically, it looks at the cluster's identity block. If that block is absent or set to a type other than SystemAssigned or UserAssigned, the cluster is flagged.

In practice, a flagged cluster usually has a servicePrincipalProfile populated with a client ID, which means it authenticates to Azure using a static secret tied to an Azure AD application registration.

Note: Managed identities come in two flavors. A system-assigned identity is created with the cluster and deleted when the cluster is deleted. A user-assigned identity is a standalone resource you create separately and can share across multiple clusters or resources. Both satisfy this check.


Why it matters

The core problem with service principals is the secret. Someone has to generate it, store it somewhere, and rotate it before it expires. Every one of those steps is a place where things go wrong.

Static secrets leak

Service principal credentials end up pasted into Terraform variable files, CI/CD pipeline secrets, Helm values, and the occasional Slack message. Once a secret leaks, anyone holding it can authenticate as your cluster's identity and act against whatever Azure resources that principal has access to. Because the secret is long-lived, the blast radius lasts until someone notices and rotates it.

Expired secrets cause outages

Service principal secrets expire. When that happens, the cluster loses the ability to provision new load balancers, attach disks, or pull from your container registry. The failures are often confusing because the cluster itself keeps running while new operations silently break. Teams have lost hours chasing "AKS can't attach a volume" errors that turned out to be an expired client secret.

Rotation is manual and risky

Rotating a service principal secret on a live cluster is a delicate operation. You update the credential in Azure AD, then update the cluster, and if the timing or values are wrong you knock the cluster offline. Managed identities remove this entire class of toil because Azure rotates the underlying credential for you, with no downtime and no human in the loop.

Warning: A leaked service principal secret with broad role assignments is effectively a master key to part of your subscription. Audit what roles your cluster's service principal holds before assuming the impact is limited.


How to fix it

The good news is that Azure supports migrating an existing cluster from a service principal to a system-assigned managed identity in place. You do not need to rebuild the cluster.

Option 1: Migrate an existing cluster to a managed identity

Run the update command and Azure swaps the service principal for a system-assigned managed identity.

Danger: This operation triggers a reconciliation of the control plane and recycles every node in the cluster. Treat it as a maintenance event. Run it during a window where rolling node restarts are acceptable, and confirm your workloads tolerate node drains.

az aks update \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --enable-managed-identity

After the update completes, you must re-apply any role assignments your cluster needs, because the new managed identity has a different object ID than the old service principal. For example, to let the cluster pull from an Azure Container Registry:

az aks update \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --attach-acr myRegistry

If your cluster talks to a custom virtual network, Key Vault, or other resources, grant the new identity the equivalent roles it had before. Pull the new principal ID with:

az aks show \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --query identity.principalId \
  --output tsv

Option 2: Use a user-assigned managed identity

If you want a stable identity that survives cluster recreation and can be assigned roles ahead of time, create a user-assigned identity first.

# Create the identity
az identity create \
  --resource-group myResourceGroup \
  --name myAKSIdentity

# Capture its resource ID
IDENTITY_ID=$(az identity show \
  --resource-group myResourceGroup \
  --name myAKSIdentity \
  --query id --output tsv)

# Assign it to the cluster
az aks update \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --enable-managed-identity \
  --assign-identity "$IDENTITY_ID"

Option 3: Provision correctly with Terraform

For new clusters, define the identity block from the start. A system-assigned identity needs just a few lines:

resource "azurerm_kubernetes_cluster" "this" {
  name                = "myAKSCluster"
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
  dns_prefix          = "myaks"

  default_node_pool {
    name       = "default"
    node_count = 3
    vm_size    = "Standard_D2s_v3"
  }

  identity {
    type = "SystemAssigned"
  }
}

For a user-assigned identity, reference it explicitly:

resource "azurerm_user_assigned_identity" "aks" {
  name                = "myAKSIdentity"
  resource_group_name = azurerm_resource_group.this.name
  location            = azurerm_resource_group.this.location
}

resource "azurerm_kubernetes_cluster" "this" {
  # ... other config ...

  identity {
    type         = "UserAssigned"
    identity_ids = [azurerm_user_assigned_identity.aks.id]
  }
}

Tip: While you are migrating to managed identities at the cluster level, take the next step and enable Microsoft Entra Workload ID. It lets individual pods get their own scoped identity instead of sharing the cluster's, which dramatically tightens the permissions any single workload can use.


How to prevent it from happening again

Fixing one cluster is easy. Stopping the next one from drifting back to a service principal takes a guardrail.

Azure Policy

Azure Policy can audit or deny clusters that lack a managed identity. Assign a custom policy with an effect of Deny so non-compliant clusters are rejected at deployment time:

{
  "if": {
    "allOf": [
      {
        "field": "type",
        "equals": "Microsoft.ContainerService/managedClusters"
      },
      {
        "field": "identity.type",
        "notIn": ["SystemAssigned", "UserAssigned"]
      }
    ]
  },
  "then": {
    "effect": "deny"
  }
}

Policy-as-code in CI/CD

If you provision through Terraform, catch misconfigurations before they reach Azure. A Checkov scan in your pipeline flags the relevant rule automatically:

checkov -d ./infra --framework terraform

You can also write a lightweight OPA or Conftest policy that rejects any azurerm_kubernetes_cluster resource missing an identity block, and run it as a required check on every pull request.

Tip: Lensix runs this check continuously across your Azure subscriptions, so even clusters created outside your IaC pipeline get caught. Pair a deploy-time gate with continuous scanning to cover both planned and unplanned changes.


Best practices

  • Default to system-assigned unless you have a specific reason to share an identity. It removes lifecycle management entirely.
  • Use user-assigned when the identity needs to outlive the cluster, be pre-assigned roles, or be shared across resources in a controlled way.
  • Scope role assignments tightly. Grant the cluster identity only the roles it needs, on only the resources it touches. Avoid subscription-wide Contributor.
  • Adopt Workload ID for pods so application workloads do not inherit the cluster's permissions wholesale.
  • Audit for leftover service principals. After migrating, delete or disable the old Azure AD application and its secrets so a stale credential can't be abused.
  • Document the migration as a maintenance event since it recycles nodes. Coordinate with the teams that own the workloads.

Moving AKS off service principals is one of the highest-value, lowest-effort security improvements available on Azure. You trade a static secret you have to babysit for a credential Azure manages for free, and you eliminate a whole category of outages and leak scenarios in the process.