Back to blog
AWSBest PracticesCloud SecurityKubernetesMonitoring & Logging

EKS Control Plane Logging Incomplete: Why It Matters and How to Fix It

Learn why partial EKS control plane logging leaves you blind during incidents, and how to enable all five log types with CLI, Terraform, and policy-as-code.

TL;DR

This check flags EKS clusters that aren't shipping all five control plane log types to CloudWatch. Without audit and authenticator logs you're blind to who did what in your cluster. Enable all five log types with a single aws eks update-cluster-config call or one line of Terraform.

When something goes wrong in a Kubernetes cluster, the first question is always the same: what happened, and who triggered it? On EKS, the answer lives in the control plane logs. The problem is that AWS ships every new cluster with control plane logging turned off by default, so unless someone explicitly enabled it, you have no record of API calls, authentication attempts, or scheduler decisions. This check exists to catch that gap before an incident forces you to discover it the hard way.


What this check detects

The eks_nologging check inspects each EKS cluster's logging.clusterLogging configuration and verifies that all five control plane log types are enabled:

  • api — logs from the Kubernetes API server, the front door to everything in the cluster
  • audit — a detailed record of every request to the API server, including who made it and what they touched
  • authenticator — EKS-specific logs showing IAM authentication and RBAC mapping decisions
  • controllerManager — logs from the controllers that reconcile cluster state
  • scheduler — logs explaining pod placement decisions

If any one of these is missing, the cluster fails the check. Partial logging is treated as a finding because the gaps are usually the ones that matter most during an investigation. A cluster with only api logging enabled, for example, still can't tell you who deleted a deployment or whether an unauthorized identity tried to authenticate.

Note: EKS control plane logs are separate from your application and node logs. The control plane is the managed Kubernetes layer AWS runs for you, so these logs cover the API server, authenticator, scheduler, and controllers, not your workloads. Node and pod logs require a separate agent like Fluent Bit.


Why it matters

Control plane logs are the difference between explaining an incident and guessing at it. Here is where the gap bites in practice.

You can't investigate what you didn't record

Suppose a service account token leaks and an attacker uses it to enumerate secrets, create a privileged pod, and exfiltrate data. With audit logging enabled, every one of those API calls is captured with the source identity, the verb, the resource, and a timestamp. Without it, your forensic timeline starts and ends with "we noticed unusual network traffic." CloudTrail records the EKS API calls AWS handles, but it does not capture in-cluster Kubernetes API activity. That visibility only comes from the audit log.

Authentication blind spots

The authenticator log is the only place that records how IAM principals map to Kubernetes RBAC. If someone modifies the aws-auth ConfigMap to grant themselves cluster-admin, or repeatedly fails authentication while probing for valid credentials, the authenticator log is where you see it. Skip it and privilege escalation through RBAC becomes effectively invisible.

Compliance and audit requirements

Frameworks like SOC 2, PCI DSS, HIPAA, and ISO 27001 all expect audit trails for systems that handle sensitive data. The CIS Amazon EKS Benchmark calls out control plane audit logging directly. An auditor who asks "show me the access logs for your production cluster" will not accept "we never turned them on" as an answer.

Warning: Control plane logs are delivered to CloudWatch Logs, which charges for ingestion and storage. A busy cluster can generate a lot of audit data. This is rarely a reason to leave logging off, but it is a reason to set log group retention and budget for it rather than getting surprised on the bill.


How to fix it

Enabling all five log types is a single configuration change. It does not restart the control plane or interrupt running workloads, so you can apply it to production safely.

Option 1: AWS CLI

Enable every control plane log type on an existing cluster:

aws eks update-cluster-config \
  --name my-production-cluster \
  --region us-east-1 \
  --logging '{
    "clusterLogging": [
      {
        "types": ["api","audit","authenticator","controllerManager","scheduler"],
        "enabled": true
      }
    ]
  }'

The command returns an update ID. Logging is fully active once the update reaches Successful status:

aws eks describe-update \
  --name my-production-cluster \
  --update-id <update-id> \
  --region us-east-1 \
  --query 'update.status'

Logs land in a CloudWatch log group named /aws/eks/my-production-cluster/cluster. By default that group has no expiry, so set a retention period to control cost:

aws logs put-retention-policy \
  --log-group-name /aws/eks/my-production-cluster/cluster \
  --retention-in-days 90

Option 2: AWS Console

  1. Open the EKS console and select your cluster.
  2. Go to the Observability tab (older consoles label it Logging).
  3. Click Manage logging.
  4. Toggle all five log types to Enabled.
  5. Click Save changes.

Option 3: Terraform

If your cluster is managed with the AWS provider, set enabled_cluster_log_types on the aws_eks_cluster resource:

resource "aws_eks_cluster" "this" {
  name     = "my-production-cluster"
  role_arn = aws_iam_role.eks.arn

  enabled_cluster_log_types = [
    "api",
    "audit",
    "authenticator",
    "controllerManager",
    "scheduler",
  ]

  vpc_config {
    subnet_ids = var.subnet_ids
  }
}

resource "aws_cloudwatch_log_group" "eks" {
  name              = "/aws/eks/my-production-cluster/cluster"
  retention_in_days = 90
}

Using the popular terraform-aws-modules/eks module? The same setting is exposed as a single variable:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.0"

  cluster_name    = "my-production-cluster"
  cluster_version = "1.30"

  cluster_enabled_log_types = [
    "api", "audit", "authenticator", "controllerManager", "scheduler"
  ]
}

Tip: Define the log group with an explicit retention period before the cluster creates it implicitly. If EKS creates the group first, Terraform may try to take it over and complain that it already exists. Creating aws_cloudwatch_log_group yourself keeps retention under version control.


How to prevent it from happening again

Fixing one cluster is easy. Making sure the next one is born compliant is the part that actually moves the needle.

Bake it into your module

The most durable fix is to set all five log types in the shared Terraform module or Helm chart that every team uses to provision clusters. If nobody creates an EKS cluster by hand, nobody can forget the logging config.

Block bad config in CI with policy-as-code

Add a check to your pipeline that fails the plan if any log type is missing. Here is an OPA/Conftest policy that validates Terraform plan JSON:

package eks.logging

required := {"api", "audit", "authenticator", "controllerManager", "scheduler"}

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_eks_cluster"
  configured := {t | t := resource.change.after.enabled_cluster_log_types[_]}
  missing := required - configured
  count(missing) > 0
  msg := sprintf("EKS cluster '%s' is missing log types: %v", [resource.name, missing])
}

Wire it into the pipeline so a non-compliant plan never reaches apply:

terraform plan -out=tfplan
terraform show -json tfplan > tfplan.json
conftest test tfplan.json --policy ./policies

Catch drift continuously

Policy gates only cover infrastructure that goes through your pipeline. Clusters created out-of-band, or modified by hand during an incident, slip past them. Continuous scanning closes that loop. Lensix runs eks_nologging across every region and account on a schedule, so a cluster that drifts out of compliance shows up as a finding without anyone having to remember to look.


Best practices

  • Enable all five log types, not a subset. Audit and authenticator carry the highest security value, but the others are cheap and round out your picture during an incident.
  • Set log retention deliberately. Match it to your compliance requirements. Ninety days is a reasonable default, but PCI and some regulatory regimes expect a year or more.
  • Ship audit logs somewhere durable. CloudWatch is the delivery target, but route logs onward to S3 or a SIEM for long-term retention and cross-account analysis. A CloudWatch subscription filter to a Firehose stream handles this cleanly.
  • Alert on the signals that matter. Logs you never look at help only after the fact. Build CloudWatch metric filters for repeated authentication failures, aws-auth ConfigMap edits, and privileged pod creation.
  • Protect the logs themselves. Lock down who can call update-cluster-config and who can delete CloudWatch log groups. An attacker who can disable logging mid-attack erases their own trail.

Danger: Disabling control plane logging on a production cluster is a step attackers take to cover their tracks, and it is something you should treat as a high-severity alert. Never include update-cluster-config logging permissions in broad developer roles, and monitor for any change that turns log types off.

Control plane logging is one of those settings that costs almost nothing to enable and is enormously expensive to live without. Turn on all five log types, gate it in CI, and scan for drift. The next time you need to answer "what happened in the cluster," you'll have the receipts.