Fix PostgreSQL Lock-Wait Logging on GCP Cloud SQL

TL;DR

This check flags Cloud SQL PostgreSQL instances where the log_lock_waits flag is off, meaning queries blocked on locks leave no trace in your logs. Turn it on by setting the log_lock_waits database flag to on so you can diagnose deadlocks and slow transactions before they snowball.

When a PostgreSQL query stalls, the cause is often invisible. The query is not slow because of a bad join or a missing index. It is slow because it is sitting in a queue, waiting for another transaction to release a lock. Without lock-wait logging, that wait is silent. You see elevated latency, frustrated users, and timeouts, but the logs give you nothing to work with.

The PostgreSQL Lock-Wait Logging Disabled check looks at your GCP Cloud SQL PostgreSQL instances and reports any that have the log_lock_waits flag turned off. It is a small flag with an outsized impact on your ability to troubleshoot.

What this check detects

The check inspects the database flags configured on each Cloud SQL PostgreSQL instance and verifies that log_lock_waits is set to on. By default, this flag is off on new instances.

When log_lock_waits is enabled, PostgreSQL writes a log entry whenever a session waits longer than deadlock_timeout (default one second) to acquire a lock. The log entry records which process is waiting, what it is waiting for, and which transaction holds the conflicting lock.

Note: The log_lock_waits flag piggybacks on the deadlock detector. PostgreSQL already checks for deadlocks after deadlock_timeout elapses, so logging a lock wait at the same moment adds almost no overhead. You are not paying for a new monitoring loop, you are just asking Postgres to write down what it already noticed.

Why it matters

Lock contention is one of the most common and least understood causes of database slowdowns. A single long-running transaction holding a row lock can back up dozens of other queries behind it. To the application, every one of those queries looks slow. To your monitoring, you see a latency spike with no obvious culprit.

Here is what you lose when lock-wait logging is off:

Blind incident response. During an outage, you cannot tell whether a query is slow because of resource exhaustion, a bad plan, or lock contention. You burn time guessing.
No record of who blocked whom. Lock waits are often transient. By the time you start investigating, the blocking transaction has committed and the evidence is gone. Logging captures it as it happens.
Harder deadlock diagnosis. PostgreSQL automatically aborts one transaction in a deadlock, but understanding why the deadlock formed requires knowing the lock-acquisition order. Lock-wait logs give you that context.
Slow root-cause analysis. Application teams often blame the database for slowness that is actually caused by their own transaction patterns, such as holding locks across an external API call. The logs make the real cause visible.

A concrete example: a billing service runs a nightly batch that updates account balances inside a long transaction. A user-facing endpoint tries to update the same accounts and hangs for thirty seconds before timing out. Without lock-wait logging, the on-call engineer sees timeouts on the API and elevated CPU on the database, and spends an hour chasing the wrong lead. With logging on, the first log line points straight at the batch job.

Warning: Enabling log_lock_waits can add noise to your logs if your workload has frequent short waits. Pair it with a sensible deadlock_timeout (the default of one second is usually fine) so you only capture waits that actually matter, and budget for the small increase in Cloud Logging ingestion volume.

How to fix it

You set log_lock_waits as a database flag on the Cloud SQL instance. The change applies without a restart, so there is no downtime.

Using the gcloud CLI

First, check the current flags on your instance so you do not accidentally drop existing ones:

gcloud sql instances describe INSTANCE_NAME \
  --project=PROJECT_ID \
  --format="value(settings.databaseFlags)"

Danger: The --database-flags argument replaces the entire flag set, it does not merge. If you pass only log_lock_waits=on and the instance already had other flags set, those other flags will be reset to their defaults. Always include every flag you want to keep in a single command.

Apply the flag, keeping any existing flags in the same command:

gcloud sql instances patch INSTANCE_NAME \
  --project=PROJECT_ID \
  --database-flags=log_lock_waits=on,existing_flag_1=value,existing_flag_2=value

If the instance has no other flags, this is all you need:

gcloud sql instances patch INSTANCE_NAME \
  --project=PROJECT_ID \
  --database-flags=log_lock_waits=on

Using the Google Cloud Console

Open SQL in the Cloud Console and select your PostgreSQL instance.
Click Edit.
Expand the Flags section.
Click Add a database flag, choose log_lock_waits, and set the value to on.
Click Save. The change applies in place.

Using Terraform

If you manage Cloud SQL with Terraform, add the flag to the settings block of your google_sql_database_instance resource:

resource "google_sql_database_instance" "postgres" {
  name             = "my-postgres-instance"
  database_version = "POSTGRES_15"
  region           = "us-central1"

  settings {
    tier = "db-custom-2-7680"

    database_flags {
      name  = "log_lock_waits"
      value = "on"
    }

    # Keep any other flags you already manage
    database_flags {
      name  = "log_min_duration_statement"
      value = "1000"
    }
  }
}

Run terraform plan to confirm the change touches only the flags you expect, then terraform apply.

Verify the change

Connect to the database and confirm the setting took effect:

SHOW log_lock_waits;

You should see on. From this point, any wait longer than deadlock_timeout will appear in your PostgreSQL logs, which you can read in Cloud Logging.

Tip: While you are in there, consider setting log_min_duration_statement to log slow queries too. Combining slow-query logging with lock-wait logging gives you a complete picture: you see both the queries that are slow on their own and the ones that are slow because they are blocked.

How to prevent it from happening again

Fixing one instance by hand is fine. Stopping the misconfiguration from creeping back into every new instance is the real win. The goal is to make log_lock_waits=on the default for everything you provision.

Bake it into your module

If teams provision Cloud SQL through a shared Terraform module, set the flag in the module itself so every consumer inherits it. Use a variable with a safe default so it is on unless someone explicitly opts out:

variable "enable_lock_wait_logging" {
  type    = bool
  default = true
}

dynamic "database_flags" {
  for_each = var.enable_lock_wait_logging ? [1] : []
  content {
    name  = "log_lock_waits"
    value = "on"
  }
}

Gate it in CI/CD with policy-as-code

Catch missing flags before they ship using Open Policy Agent against your Terraform plan. A Rego rule like this denies any Cloud SQL instance that lacks the flag:

package cloudsql

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "google_sql_database_instance"
  flags := resource.change.after.settings[_].database_flags
  not has_lock_wait_logging(flags)
  msg := sprintf("Cloud SQL instance %q must set log_lock_waits=on", [resource.address])
}

has_lock_wait_logging(flags) {
  flags[_].name == "log_lock_waits"
  flags[_].value == "on"
}

Wire this into your pipeline so a pull request that creates a noncompliant instance fails the check and never reaches production.

Run continuous scans

Policy gates only catch resources that go through your pipeline. Instances created by hand, by another team, or before you adopted the policy will slip past. Lensix scans your live GCP environment for this check (sql_pglockwaits) and reports drift continuously, which covers the gap between what your IaC says and what actually exists.

Best practices

Enable lock-wait logging on every PostgreSQL instance, not just production. Staging is where you want to catch lock-contention bugs before they reach customers.
Tune deadlock_timeout deliberately. The default of one second is a reasonable balance. Lowering it logs more waits but adds CPU for more frequent deadlock checks. Raising it reduces noise but may hide brief contention.
Route logs somewhere useful. Lock-wait entries are only valuable if someone can find them. Set up Cloud Logging sinks and, ideally, alerts on repeated lock waits so contention surfaces proactively rather than during an incident.
Address the application patterns the logs reveal. Logging tells you contention exists. The fix is usually upstream: shorter transactions, consistent lock-acquisition order across code paths, and never holding a database transaction open across a slow external call.
Combine with related logging flags. Pair log_lock_waits with log_min_duration_statement, log_checkpoints, and log_temp_files for a well-rounded view of database health.

Lock-wait logging costs almost nothing and answers a question you will eventually need answered at the worst possible time. Turn it on now, make it the default for everything you provision, and the next time a query mysteriously hangs you will have a log line pointing at the cause instead of a guessing game.

PostgreSQL Lock-Wait Logging Disabled on Cloud SQL