Back to blog
AzureBest PracticesCloud SecurityCompute & ContainersKubernetes

AKS Cluster Not Using BYOK Encryption: Why It Matters and How to Fix It

Learn why AKS clusters should use customer-managed keys (BYOK) for encryption at rest, how to enable disk encryption sets, and how to enforce it with policy.

TL;DR

This check flags AKS clusters that rely on Azure-managed keys instead of a customer-managed key (BYOK) for encryption at rest. Without BYOK you lose control over key rotation, revocation, and audit. Fix it by creating an Azure Key Vault key, granting the cluster's disk encryption set access, and attaching it to a new node pool or cluster.

By default, Azure encrypts the managed disks behind your AKS nodes with platform-managed keys. That covers the baseline "data at rest is encrypted" requirement, but the keys themselves are owned and rotated by Microsoft. For a lot of teams that is fine. For teams under HIPAA, PCI DSS, FedRAMP, or strict internal data governance, it is not. They need to own the key, rotate it on their own schedule, and be able to cut access in an incident.

The aks_nobyok check looks at whether your AKS cluster's node OS and data disks are encrypted with a customer-managed key (CMK), also called bring-your-own-key (BYOK), through a disk encryption set backed by Azure Key Vault. If the cluster is using only platform-managed keys, the check fails.


What this check detects

The check inspects the disk encryption configuration of your AKS agent pools. Specifically, it verifies whether a diskEncryptionSetID is set on the managed cluster. That property points to a Disk Encryption Set (DES), which in turn references a key stored in your Azure Key Vault.

  • Pass: The cluster references a disk encryption set backed by a customer-managed key in Key Vault.
  • Fail: The cluster has no disk encryption set, meaning node disks fall back to platform-managed keys.

Note: BYOK in AKS covers the OS and data disks attached to your nodes. It does not encrypt secrets stored in etcd. For that, you want KMS etcd encryption, which is a separate feature (and a separate concern). Don't assume BYOK on disks protects Kubernetes Secrets at the API layer.


Why it matters

Encryption at rest with platform keys protects you against one specific scenario: someone walking out of an Azure datacenter with a physical disk. That threat is real but rare. The reasons to care about BYOK are mostly about control and compliance, not raw cryptographic strength.

Key revocation during an incident

If an attacker compromises your subscription or you discover a data exposure, BYOK lets you disable the key in Key Vault and instantly render the underlying disk data unreadable. With platform-managed keys you have no such kill switch. You are dependent on Microsoft's key lifecycle, which you cannot touch.

Compliance and audit requirements

Many frameworks explicitly require customer-controlled key management. PCI DSS requirement 3 talks about key custody, HIPAA expects you to demonstrate control over how PHI is protected, and FedRAMP moderate and high baselines often mandate CMK. An auditor who sees platform-managed keys on a cluster handling regulated data will write it up.

Separation of duties

With BYOK, the people who manage the cluster and the people who manage the encryption keys can be different teams with different RBAC. A platform engineer with full AKS access still cannot decrypt the disks without Key Vault permissions. That separation is hard to argue against in a security review.

Warning: You cannot add a disk encryption set to an existing AKS node pool. The encryption configuration is set at creation time. Switching an existing cluster to BYOK means creating new node pools (or a new cluster) and migrating workloads. Plan for this rather than expecting an in-place toggle.


How to fix it

Enabling BYOK on AKS is a four-step process: create a Key Vault and key, create a disk encryption set that references the key, grant the disk encryption set access to the vault, then create your cluster or node pool pointing at the set.

1. Create a Key Vault with purge protection

Disk encryption sets require both soft delete and purge protection on the vault.

RG=aks-byok-rg
LOCATION=eastus
KV_NAME=aks-byok-kv-$RANDOM

az group create --name $RG --location $LOCATION

az keyvault create \
  --name $KV_NAME \
  --resource-group $RG \
  --location $LOCATION \
  --enable-purge-protection true \
  --enable-rbac-authorization false

2. Create the encryption key

az keyvault key create \
  --vault-name $KV_NAME \
  --name aks-disk-key \
  --kty RSA \
  --size 4096

KEY_ID=$(az keyvault key show \
  --vault-name $KV_NAME \
  --name aks-disk-key \
  --query 'key.kid' -o tsv)

3. Create the disk encryption set and grant it Key Vault access

DES_NAME=aks-des

az disk-encryption-set create \
  --name $DES_NAME \
  --resource-group $RG \
  --source-vault $KV_NAME \
  --key-url $KEY_ID

# Grant the DES managed identity access to wrap/unwrap the key
DES_IDENTITY=$(az disk-encryption-set show \
  --name $DES_NAME \
  --resource-group $RG \
  --query 'identity.principalId' -o tsv)

az keyvault set-policy \
  --name $KV_NAME \
  --object-id $DES_IDENTITY \
  --key-permissions get wrapKey unwrapKey

4. Create the AKS cluster with the disk encryption set

DES_ID=$(az disk-encryption-set show \
  --name $DES_NAME \
  --resource-group $RG \
  --query id -o tsv)

az aks create \
  --name aks-byok-cluster \
  --resource-group $RG \
  --node-osdisk-diskencryptionset-id $DES_ID \
  --node-count 3 \
  --generate-ssh-keys

Migrating an existing cluster

Since you cannot retrofit encryption onto existing node pools, the supported path is to add a new node pool that uses the disk encryption set, cordon and drain the old nodes, then remove the old pool.

# Add a BYOK node pool to an existing cluster
az aks nodepool add \
  --cluster-name aks-byok-cluster \
  --resource-group $RG \
  --name byokpool \
  --node-count 3
# Note: the cluster's DES applies to node OS disks at cluster scope

Danger: Draining nodes evicts running pods. If your workloads do not have PodDisruptionBudgets and multiple replicas, the migration will cause downtime. Validate that your deployments tolerate eviction before running kubectl drain on production nodes.


Infrastructure as code

If you manage AKS with Terraform, wire the disk encryption set into the cluster definition so BYOK is enforced from the start.

resource "azurerm_key_vault_key" "aks" {
  name         = "aks-disk-key"
  key_vault_id = azurerm_key_vault.aks.id
  key_type     = "RSA"
  key_size     = 4096
  key_opts     = ["wrapKey", "unwrapKey"]
}

resource "azurerm_disk_encryption_set" "aks" {
  name                = "aks-des"
  resource_group_name = azurerm_resource_group.aks.name
  location            = azurerm_resource_group.aks.location
  key_vault_key_id    = azurerm_key_vault_key.aks.id

  identity {
    type = "SystemAssigned"
  }
}

resource "azurerm_kubernetes_cluster" "main" {
  name                = "aks-byok-cluster"
  location            = azurerm_resource_group.aks.location
  resource_group_name = azurerm_resource_group.aks.name
  dns_prefix          = "aksbyok"

  disk_encryption_set_id = azurerm_disk_encryption_set.aks.id

  default_node_pool {
    name       = "default"
    node_count = 3
    vm_size    = "Standard_D2s_v3"
  }

  identity {
    type = "SystemAssigned"
  }
}

Tip: Enable auto-rotation on the Key Vault key so you are not tracking rotation manually. Set a rotation policy with az keyvault key rotation-policy update and AKS will pick up the new key version automatically, since the disk encryption set tracks the key, not a pinned version.


How to prevent it from happening again

Catching a missing disk encryption set after the cluster is live means a painful migration. The cheaper fix is to block clusters without BYOK before they get created.

Azure Policy

Azure has a built-in policy definition that audits or denies AKS clusters without a disk encryption set. Assign it in deny mode to your production management group.

# Built-in: "Azure Kubernetes Service clusters should use disk encryption set"
az policy assignment create \
  --name require-aks-byok \
  --policy "7d7be79c-23ba-4033-84dd-45e2a5ccdd67" \
  --scope "/subscriptions/" \
  --params '{"effect":{"value":"Deny"}}'

CI/CD gate with policy-as-code

For teams shipping AKS through Terraform, run a Checkov or Conftest scan in the pipeline and fail the build if disk_encryption_set_id is missing.

checkov -d ./terraform \
  --check CKV_AZURE_117 \
  --compact

Note: Combine the deny policy with a continuous check like Lensix scanning your live environment. Policy stops new bad clusters; continuous scanning catches drift, clusters created outside your pipeline, and resources that predate your policy assignment.


Best practices

  • Use a dedicated Key Vault for disk encryption keys. Keep them separate from application secrets so you can apply tighter RBAC and audit access independently.
  • Enable purge protection everywhere. It is required for disk encryption sets, and it stops an attacker (or a mistake) from permanently destroying a key that data depends on.
  • Turn on key rotation policies. Manual rotation gets forgotten. Automated rotation keeps you compliant without a calendar reminder.
  • Pair BYOK with KMS etcd encryption. Disk BYOK protects node storage; etcd KMS encryption protects Kubernetes Secrets at the API layer. Together they give you a defensible encryption story.
  • Monitor Key Vault access logs. Send diagnostic logs to a Log Analytics workspace and alert on unexpected unwrapKey calls or access from the disk encryption set identity.
  • Document your revocation runbook. The whole point of BYOK is the ability to cut access fast. Make sure your incident responders know exactly which key to disable and what breaks when they do.

BYOK is not the right call for every workload, and adding it where you do not need it just creates operational overhead. But for regulated data, multi-tenant platforms, or anything where you have to prove key custody, owning the key is the difference between passing an audit and explaining yourself. Make the decision deliberately, then enforce it with policy so nobody quietly ships a cluster without it.