GKE Cluster Logging Disabled: Detect & Fix | Lensix

TL;DR

This check flags GKE clusters running without Cloud Logging enabled, which leaves you blind to control plane and workload activity during incidents. Re-enable logging with gcloud container clusters update CLUSTER --logging=SYSTEM,WORKLOAD.

When a Kubernetes cluster misbehaves, gets compromised, or quietly drops traffic, the first thing anyone reaches for is the logs. If your GKE cluster has Cloud Logging turned off, those logs were never captured in the first place. There is no rewind button. The gke_nologging check exists to catch this gap before it turns a routine investigation into a dead end.

This post walks through what the check looks at, the concrete risks of running without logging, how to fix it across CLI, console, and Terraform, and how to make sure it never regresses.

What this check detects

The gke_nologging check inspects each Google Kubernetes Engine cluster and reports any cluster where Cloud Logging is disabled. In GKE terms, this means the cluster's loggingService is set to none (or, on newer clusters, the loggingConfig has no enabled components).

GKE logging is delivered through Cloud Logging and is configured per cluster. Modern clusters use a component-based model where you choose which streams to collect:

SYSTEM_COMPONENTS — logs from kube-system workloads, kubelet, and core cluster components
WORKLOADS — stdout and stderr from your application containers
APISERVER, CONTROLLER_MANAGER, SCHEDULER — control plane logs

When all of these are off, nothing from the cluster reaches Cloud Logging. The check fails when no logging components are enabled.

Note: GKE Autopilot clusters enable SYSTEM and WORKLOAD logging by default and do not let you fully disable it. This check most commonly fires on Standard clusters where logging was explicitly turned off, often to save money or because an old cluster was provisioned before logging defaults improved.

Why it matters

Logging is not a nice-to-have for a cluster running production workloads. It is the foundation of nearly every operational and security workflow you depend on.

You cannot investigate what you did not record

Cloud Logging is append-only from the perspective of an attacker who lands on a node. If logs are streaming to Cloud Logging in real time, even a compromised cluster keeps producing an audit trail outside the blast radius. With logging disabled, an attacker who deploys a malicious pod, exfiltrates secrets, or pivots through your cluster leaves no trace you can pull after the fact.

Warning: Disabling cluster logging does not stop containers from writing to stdout. It just means those logs live only on the node and rotate away. Once a node is recycled, drained, or autoscaled down, the evidence is gone for good.

Incident response grinds to a halt

Picture a 2 a.m. page: a service is throwing 500s and customers are affected. Normally you query Cloud Logging, filter on the failing workload, and read the stack trace. Without logging, your only option is to exec into a live pod and hope the error is still reproducible, or wait for it to happen again with logging hastily turned on. That is minutes or hours of avoidable downtime.

Compliance and audit gaps

Frameworks like SOC 2, PCI DSS, ISO 27001, and HIPAA all expect you to retain logs of activity in systems that process sensitive data. A cluster with logging disabled is a finding waiting to be written up by an auditor, and it undermines your ability to prove what happened during an incident.

No data for metrics, alerts, or autoscaling decisions

Many alerting pipelines build log-based metrics on top of Cloud Logging. Error rate alerts, suspicious-activity detection, and dashboards all dry up when the source goes quiet. Turning off logging silently breaks downstream tooling that nobody remembers depended on it.

How to fix it

Re-enabling logging on an existing cluster is a quick, non-destructive update. Pick the method that matches how you manage infrastructure.

Option 1: gcloud CLI

First confirm the current state of the cluster:

gcloud container clusters describe CLUSTER_NAME \
  --location=REGION_OR_ZONE \
  --format="value(loggingConfig.componentConfig.enableComponents)"

If the output is empty or shows none, enable system and workload logging:

gcloud container clusters update CLUSTER_NAME \
  --location=REGION_OR_ZONE \
  --logging=SYSTEM,WORKLOAD

To also capture control plane logs, which are valuable for security investigations, include the API server, scheduler, and controller manager:

gcloud container clusters update CLUSTER_NAME \
  --location=REGION_OR_ZONE \
  --logging=SYSTEM,WORKLOAD,API_SERVER,SCHEDULER,CONTROLLER_MANAGER

Note: Updating the logging config is an online operation. Your workloads keep running and there is no downtime. The change applies to existing and new nodes automatically.

Option 2: Google Cloud Console

Open Kubernetes Engine and select your cluster.
Click Edit on the cluster details page.
Under Features, find Cloud Logging and set it to Enabled.
Choose the components to collect (at minimum System and Workloads).
Save the changes.

Option 3: Terraform

If your clusters are managed with the google_container_cluster resource, set the logging_config block explicitly so the desired state is enforced on every apply:

resource "google_container_cluster" "primary" {
  name     = "production-cluster"
  location = "us-central1"

  logging_config {
    enable_components = [
      "SYSTEM_COMPONENTS",
      "WORKLOADS",
      "APISERVER",
      "SCHEDULER",
      "CONTROLLER_MANAGER",
    ]
  }

  # ... rest of cluster config
}

Run a plan to confirm Terraform only updates the logging config, then apply:

terraform plan -target=google_container_cluster.primary
terraform apply -target=google_container_cluster.primary

Warning: Cloud Logging ingestion is billed per GiB after a monthly free allotment. Enabling WORKLOADS logging on a chatty cluster can generate meaningful volume. This is rarely a reason to leave logging off, but it is a reason to set up exclusion filters and retention policies rather than ingesting everything blindly.

Trimming noise without losing coverage

If cost is the concern that led someone to disable logging, the right answer is targeted exclusion, not a blackout. Use a Cloud Logging exclusion filter to drop high-volume, low-value entries while keeping everything else:

gcloud logging sinks update _Default \
  --add-exclusion=name=drop-healthchecks,filter='resource.type="k8s_container" AND httpRequest.requestUrl=~"/healthz"'

How to prevent it from happening again

Fixing one cluster is easy. Keeping every cluster compliant across teams and projects is where the real work is. Bake the requirement into the places where clusters get created and changed.

Enforce with Organization Policy

GCP does not ship a built-in constraint that forces GKE logging on, but you can enforce it with a custom organization policy or with Policy Controller (the managed Gatekeeper offering in GKE). A Policy Controller constraint can reject any cluster config that lacks logging components.

Gate it in CI/CD with policy-as-code

If you provision clusters through Terraform, scan the plan before it reaches production. A Conftest / OPA Rego policy makes the intent explicit and fails the pipeline on violations:

package gke.logging

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "google_container_cluster"
  components := resource.change.after.logging_config[_].enable_components
  not contains(components, "SYSTEM_COMPONENTS")
  msg := sprintf("Cluster '%s' must enable SYSTEM_COMPONENTS logging", [resource.change.after.name])
}

contains(arr, val) {
  arr[_] == val
}

Wire it into the pipeline so the check runs on every plan:

terraform plan -out=tfplan.binary
terraform show -json tfplan.binary > tfplan.json
conftest test tfplan.json --policy policy/

Tip: Pair the CI gate with continuous detection in Lensix. Policy-as-code catches issues at deploy time, but it will not catch a cluster someone changed by hand in the console or one created in a project that bypasses your pipeline. Lensix scans live state and flags drift the moment gke_nologging starts failing again.

Standardize on a golden cluster module

Wrap your cluster definition in a shared Terraform module with logging hard-coded and not overridable. Teams consume the module, get logging for free, and cannot accidentally provision a blind cluster. This removes the decision from individual engineers entirely.

Best practices

Always enable SYSTEM and WORKLOADS at a minimum. These cover cluster health and your application output, which are the two things you reach for most.
Enable control plane logs for production. APISERVER, SCHEDULER, and CONTROLLER_MANAGER logs are critical for security forensics and for understanding scheduling and admission decisions.
Pair logging with Cloud Audit Logs. GKE logging captures cluster activity, but Admin Activity and Data Access audit logs at the project level record who did what to the cluster itself. Both matter for a complete picture.
Set retention deliberately. The _Default bucket retains logs for 30 days. For compliance you often need longer, so create a dedicated log bucket with the required retention or route logs to a sink for cold storage.
Control cost with exclusions, not disablement. Drop health checks, readiness probes, and verbose debug noise rather than turning logging off entirely.
Build log-based metrics and alerts. Once logs flow, turn them into signal: error-rate alerts, denied-request counts, and dashboards that surface problems before customers do.

Tip: Treat logging configuration as part of your cluster's definition of done. A cluster is not production-ready until logs are flowing, retention is set, and at least one alert is firing off them. Checking that box once, in your module, beats remediating it cluster by cluster forever.

Cloud Logging on GKE is cheap insurance against expensive blind spots. Enabling it takes one command, costs little, and means that when something goes wrong you have the answers waiting for you instead of a cluster that already forgot what it did.

GKE Cluster Logging Disabled: Why It Matters and How to Fix It