Fix AKS Clusters With No Diagnostic Logs | Lensix

TL;DR

This check flags AKS clusters that have no diagnostic settings configured, which means control plane logs like API server activity, audit events, and scheduler decisions are being thrown away. Without them you cannot investigate a breach or debug a control plane issue after the fact. Fix it by creating a diagnostic setting that ships the relevant log categories to Log Analytics, a storage account, or an event hub.

When something goes wrong in a Kubernetes cluster, the first thing any engineer reaches for is the logs. In AKS, the logs that matter most for security and troubleshooting do not live on your nodes. They live in the managed control plane, and Azure does not forward them anywhere unless you explicitly tell it to. A cluster with no diagnostic settings is a cluster running blind.

The AKS Cluster Has No Diagnostic Logs check (aks_nodiagnosticlogs) catches exactly this gap: an AKS cluster with no diagnostic settings attached, so none of the control plane log categories are being captured.

What this check detects

AKS is a managed service, which means Microsoft runs the Kubernetes control plane for you. The components you would normally find on a self-managed master node, the API server, the scheduler, the controller manager, the audit log, and others, are all hidden behind the managed surface. You cannot SSH into them and you cannot tail their logs directly.

The only way to see those logs is through Azure diagnostic settings. A diagnostic setting tells Azure which log categories to emit and where to send them. This check fires when an AKS cluster has zero diagnostic settings, meaning every one of those control plane log streams is being discarded the moment it is generated.

Note: Diagnostic settings are separate from Container Insights and from your application logs. Container Insights gives you node and pod metrics and stdout/stderr from workloads. Diagnostic settings give you the control plane logs, including the Kubernetes audit log. You generally want both, and this check is specifically about the latter.

The control plane log categories available on AKS include:

kube-apiserver — requests hitting the Kubernetes API server
kube-audit — the full audit log, every authenticated request with who, what, and when
kube-audit-admin — a lighter audit stream that drops get and list read events
kube-controller-manager — controller loop activity
kube-scheduler — pod scheduling decisions
cluster-autoscaler — scale up and scale down events
guard — Azure AD authentication and authorization decisions

Why it matters

The absence of control plane logs is not a theoretical risk. It directly undermines your ability to detect and investigate the most common Kubernetes attack patterns.

You cannot investigate a breach you cannot see

Consider a scenario where an attacker steals a service account token and uses it to create a privileged pod, mount the host filesystem, and pivot. Every one of those actions passes through the API server and lands in the kube-audit log. With diagnostic settings configured, you have a timestamped record of the token used, the namespace, the source IP, and the exact resource manifests. Without it, you have nothing. The cluster behaves as if the attacker was never there.

Danger: The Kubernetes audit log is often the only authoritative record of who did what inside a cluster. If a cluster is compromised and you have no audit log, you cannot scope the incident, you cannot prove what was or was not accessed, and you cannot satisfy a breach notification investigation. Treat missing audit logging as a critical security gap, not a nice-to-have.

Compliance frameworks expect it

SOC 2, ISO 27001, PCI DSS, and HIPAA all require some form of audit logging and log retention for systems that process sensitive data. An AKS cluster running regulated workloads with no diagnostic settings is a finding waiting to be written up. Auditors will ask for control plane audit trails, and "we never turned them on" is not an answer that passes.

Operational debugging suffers too

This is not only a security concern. When pods refuse to schedule, when the autoscaler behaves strangely, or when API requests start timing out, the control plane logs are where the answers live. Teams without diagnostic settings often spend hours guessing at problems that would take minutes to diagnose with kube-scheduler or kube-apiserver logs in hand.

How to fix it

Fixing this means creating a diagnostic setting on the cluster and pointing it at a destination. Most teams send logs to a Log Analytics workspace because it supports KQL queries and integrates with Microsoft Sentinel, but a storage account works for cheap long term retention and an event hub works for streaming to a SIEM.

Step 1: Pick or create a Log Analytics workspace

az monitor log-analytics workspace create \
  --resource-group myResourceGroup \
  --workspace-name aks-logs-workspace \
  --location eastus

Grab the workspace resource ID, you will need it in the next step:

WORKSPACE_ID=$(az monitor log-analytics workspace show \
  --resource-group myResourceGroup \
  --workspace-name aks-logs-workspace \
  --query id -o tsv)

Step 2: Get the AKS cluster resource ID

CLUSTER_ID=$(az aks show \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --query id -o tsv)

Step 3: Create the diagnostic setting

This command enables the security-relevant log categories and ships them to the workspace. Note that you should pick either kube-audit or kube-audit-admin, not both, since kube-audit is a superset and enabling both doubles ingestion cost.

az monitor diagnostic-settings create \
  --name aks-diagnostics \
  --resource "$CLUSTER_ID" \
  --workspace "$WORKSPACE_ID" \
  --logs '[
    {"category": "kube-apiserver", "enabled": true},
    {"category": "kube-audit-admin", "enabled": true},
    {"category": "kube-controller-manager", "enabled": true},
    {"category": "kube-scheduler", "enabled": true},
    {"category": "cluster-autoscaler", "enabled": true},
    {"category": "guard", "enabled": true}
  ]' \
  --metrics '[{"category": "AllMetrics", "enabled": true}]'

Warning: The kube-audit category can generate a large volume of data on busy clusters, and Log Analytics charges per gigabyte ingested. On high-traffic clusters, kube-audit-admin is the sensible default because it drops noisy read-only get and list events while keeping every mutating action. Reserve full kube-audit for clusters where you need complete read visibility, and budget for the ingestion cost.

Console steps

If you prefer the portal:

Open your AKS cluster in the Azure portal.
Under Monitoring, select Diagnostic settings.
Click Add diagnostic setting.
Tick the log categories you want, at minimum kube-audit-admin, kube-apiserver, and guard.
Choose a destination: Send to Log Analytics workspace, Archive to a storage account, or Stream to an event hub.
Save.

Terraform

If you manage AKS with Terraform, define the diagnostic setting in code so it is created alongside the cluster and never drifts:

resource "azurerm_monitor_diagnostic_setting" "aks" {
  name                       = "aks-diagnostics"
  target_resource_id         = azurerm_kubernetes_cluster.this.id
  log_analytics_workspace_id = azurerm_log_analytics_workspace.this.id

  enabled_log {
    category = "kube-apiserver"
  }
  enabled_log {
    category = "kube-audit-admin"
  }
  enabled_log {
    category = "kube-controller-manager"
  }
  enabled_log {
    category = "kube-scheduler"
  }
  enabled_log {
    category = "cluster-autoscaler"
  }
  enabled_log {
    category = "guard"
  }

  metric {
    category = "AllMetrics"
    enabled  = true
  }
}

Tip: Define the diagnostic setting in the same module that creates the cluster and pass the workspace ID in as a variable. That way every cluster your team provisions ships its logs from day one, and there is no manual step anyone can forget.

How to prevent it from recurring

Fixing one cluster by hand is fine. Making sure the next twenty clusters are never created without diagnostic settings is what actually closes the gap.

Azure Policy

Azure ships built-in policies that audit or enforce diagnostic settings on AKS. Assign the policy "Configure Azure Kubernetes Service clusters to enable Diagnostic Settings to a Log Analytics workspace", which has a DeployIfNotExists effect. When a cluster is created without diagnostic settings, the policy automatically remediates it by attaching one.

az policy assignment create \
  --name "aks-diag-settings" \
  --display-name "Enable AKS diagnostic settings" \
  --policy "/providers/Microsoft.Authorization/policyDefinitions/6c66c325-74c8-42fd-a286-a74b0e2939d8" \
  --scope "/subscriptions/" \
  --location eastus \
  --mi-system-assigned \
  --role Contributor \
  --identity-scope "/subscriptions/"

Note: DeployIfNotExists policies need a managed identity with permission to create the diagnostic setting, which is why the command above assigns a system-assigned identity and the Contributor role. Without the identity and role, the policy will report non-compliance but will not be able to remediate.

CI/CD gate

If you provision infrastructure through pipelines, add a check that fails the build when an AKS module is missing a diagnostic setting block. With Terraform you can enforce this with an OPA or Conftest policy:

package main

deny[msg] {
  resource := input.resource.azurerm_kubernetes_cluster[name]
  not has_diagnostic_setting(name)
  msg := sprintf("AKS cluster '%s' has no azurerm_monitor_diagnostic_setting", [name])
}

has_diagnostic_setting(cluster_name) {
  setting := input.resource.azurerm_monitor_diagnostic_setting[_]
  contains(setting.target_resource_id, cluster_name)
}

Continuous monitoring

Policy and pipeline gates catch new clusters, but someone can still disable a diagnostic setting by hand or delete a workspace out from under a cluster. A platform like Lensix continuously evaluates the aks_nodiagnosticlogs check across every subscription, so if a cluster loses its diagnostic settings you find out right away instead of discovering it during an incident.

Best practices

Always enable an audit category. Use kube-audit-admin as the default and full kube-audit only where read visibility is required. The audit log is the single most valuable category for security.
Send logs to Log Analytics for active clusters. KQL queries and Sentinel integration make Log Analytics the right home for clusters you actively monitor. Pair it with a storage account export if you need cheap long term archival.
Set retention deliberately. Match your Log Analytics retention to your compliance requirements. Many frameworks expect at least one year of audit log retention.
Forward to your SIEM. If you run a central SIEM, stream the audit categories to an event hub so cluster activity is correlated alongside the rest of your estate.
Build detections on top of the logs. Collecting logs is only half the job. Write alerts for suspicious patterns like exec into a pod, creation of privileged containers, or access to secrets in sensitive namespaces.
Standardize across the fleet. Use one Terraform module and one Azure Policy assignment so every cluster has identical, predictable logging. Inconsistency is how the one unlogged cluster ends up being the one that gets breached.

The logs you wish you had collected are always the ones from the cluster you forgot to configure. Turn diagnostic settings on everywhere, enforce it with policy, and verify it continuously.

AKS Cluster Has No Diagnostic Logs: Why It Matters and How to Fix It