This check flags Cloud SQL PostgreSQL instances where the log_checkpoints flag is off, leaving you blind to checkpoint activity that signals I/O pressure, write spikes, and recovery behavior. Turn it on by setting the database flag log_checkpoints=on via gcloud, Terraform, or the console.
Checkpoints are one of the most important background operations in PostgreSQL, and also one of the easiest to ignore until they start hurting you. When checkpoint logging is disabled on a Cloud SQL PostgreSQL instance, you lose visibility into how often checkpoints run, how long they take, and how much data they flush. That blind spot makes performance problems harder to diagnose and slows down post-incident analysis.
This Lensix check, sql_pgcheckpoints in the sql_checks module, looks at your GCP Cloud SQL PostgreSQL instances and reports any that do not have the log_checkpoints database flag enabled.
What this check detects
PostgreSQL exposes a configuration parameter called log_checkpoints. When it is on, the database writes a log line every time a checkpoint completes, including statistics like the number of buffers written, the time spent on the sync phase, and the write throughput.
The check inspects the settings.databaseFlags array on each Cloud SQL PostgreSQL instance. If log_checkpoints is missing or set to off, the instance is flagged.
Note: A checkpoint is the point at which PostgreSQL flushes all dirty pages from shared buffers to disk and records that position in the write-ahead log (WAL). After a crash, recovery only has to replay WAL from the last checkpoint forward, so checkpoint frequency directly affects both write amplification and recovery time.
It is worth noting that in PostgreSQL 15 and later, log_checkpoints defaults to on in upstream Postgres. On Cloud SQL the default still depends on the instance configuration and version, so this check is the reliable way to confirm the flag is actually set rather than assumed.
Why it matters
Checkpoint logging is not a direct security control, but it sits squarely in the operational visibility category that incident response and performance tuning both depend on. Here is where the missing data bites you.
You cannot see write pressure building
If checkpoints start firing more frequently than expected, that usually means your workload is generating WAL faster than max_wal_size can absorb between timed checkpoints. Without the log lines, the first symptom you notice is often degraded write latency or I/O saturation, with no breadcrumb trail explaining why.
A healthy log line looks like this:
LOG: checkpoint complete: wrote 4523 buffers (3.4%);
0 WAL file(s) added, 0 removed, 12 recycled;
write=89.123 s, sync=0.034 s, total=89.421 s;
sync files=58, longest=0.012 s, average=0.001 s;
distance=196608 kB, estimate=196608 kB
The write and sync timings tell you whether checkpoints are spread out smoothly or hammering the disk in bursts. The distance value tells you how much WAL is being produced. None of this is recoverable after the fact if logging was off.
Slower incident response
When a database has an availability or latency incident, checkpoint logs are part of the standard timeline you reconstruct. Were checkpoints overlapping? Did a checkpoint coincide with the latency spike? Was the instance under sustained write load before the failover? Without these logs you are guessing.
Compliance and audit gaps
Many database logging benchmarks, including the CIS Google Cloud Platform Foundations Benchmark, call out checkpoint and connection logging as expected configuration for production database instances. A disabled flag is an easy finding for an auditor and an easy fix for you.
Warning: Checkpoint log lines are written to PostgreSQL logs, which flow into Cloud Logging. On a very busy instance this adds a steady trickle of log volume. It is small compared to query logging, but if you have aggressive log retention or export to BigQuery, factor in the marginal cost before enabling it fleet-wide.
How to fix it
Enabling checkpoint logging means setting the log_checkpoints database flag to on. This flag does not require a restart on Cloud SQL PostgreSQL, so the change applies cleanly with no downtime.
Option 1: gcloud CLI
First, check the current flags so you do not accidentally overwrite existing ones:
gcloud sql instances describe INSTANCE_NAME \
--project=PROJECT_ID \
--format="value(settings.databaseFlags)"
Warning: The --database-flags argument replaces the entire flag set. If your instance already has flags like log_connections or max_connections, list them all in a single command or you will silently remove them.
Now apply log_checkpoints alongside any existing flags:
gcloud sql instances patch INSTANCE_NAME \
--project=PROJECT_ID \
--database-flags=log_checkpoints=on,log_connections=on
Confirm the change took effect:
gcloud sql instances describe INSTANCE_NAME \
--project=PROJECT_ID \
--format="json(settings.databaseFlags)"
Option 2: Google Cloud Console
- Open SQL in the Google Cloud Console and select your PostgreSQL instance.
- Click Edit.
- Expand the Flags section.
- Click Add a database flag.
- Select
log_checkpointsand set the value toon. - Click Save. The flag applies without a restart.
Option 3: Terraform
If you manage Cloud SQL with Terraform, add the flag to the settings block. Define every flag you want, since Terraform also manages the full set:
resource "google_sql_database_instance" "postgres" {
name = "my-postgres-instance"
database_version = "POSTGRES_15"
region = "us-central1"
settings {
tier = "db-custom-2-7680"
database_flags {
name = "log_checkpoints"
value = "on"
}
database_flags {
name = "log_connections"
value = "on"
}
}
}
terraform plan
terraform apply
Tip: If you run many instances, define your baseline database flags once in a shared Terraform module or a locals block and reuse it. That way log_checkpoints and friends ship with every new instance by default, and the fix becomes a one-time change instead of a recurring chore.
How to prevent it from happening again
One-off remediation closes the finding for a single instance. Stopping it from reappearing on the next instance someone spins up requires guardrails in your provisioning pipeline.
Gate it in CI with policy-as-code
If you use Terraform, an OPA/Conftest policy can reject any Cloud SQL PostgreSQL instance plan that does not include log_checkpoints=on. Here is a Rego rule that fails the plan when the flag is missing or off:
package cloudsql.logging
deny[msg] {
resource := input.resource_changes[_]
resource.type == "google_sql_database_instance"
flags := resource.change.after.settings[_].database_flags
not has_checkpoint_logging(flags)
msg := sprintf("Cloud SQL instance '%s' must set log_checkpoints=on", [resource.name])
}
has_checkpoint_logging(flags) {
flag := flags[_]
flag.name == "log_checkpoints"
flag.value == "on"
}
Wire it into your pipeline against a saved plan:
terraform plan -out=tfplan.binary
terraform show -json tfplan.binary > tfplan.json
conftest test tfplan.json --policy ./policies
Enforce with Organization Policy and continuous scanning
Policy-as-code in CI catches Terraform changes, but it does not catch a flag flipped by hand in the console. Pair the CI gate with continuous scanning so drift gets caught wherever it originates. Lensix runs the sql_pgcheckpoints check on a schedule across all your projects, so an instance created outside your IaC pipeline still gets flagged.
Tip: Treat database flags as a single managed baseline. Bundle log_checkpoints, log_connections, log_disconnections, and log_lock_waits together. They are all cheap to enable, they all improve observability, and managing them as one block keeps your fleet consistent.
Best practices
- Enable checkpoint logging everywhere, not just production. Staging instances are where you reproduce performance issues, and they need the same visibility.
- Tune checkpoints, do not just log them. Once you can see frequency, adjust
max_wal_sizeandcheckpoint_completion_targetso checkpoints spread their writes smoothly instead of spiking I/O. - Watch for "checkpoints occurring too frequently" warnings. If your logs show checkpoints triggered by WAL volume rather than the timed interval, your
max_wal_sizeis too small for the workload. - Pair logging flags as a set.
log_checkpoints,log_connections, andlog_lock_waitstogether give you a solid baseline of operational visibility with negligible overhead. - Route logs somewhere you will actually read them. Checkpoint lines flow into Cloud Logging. Build a log-based metric or alert on checkpoint frequency so the data drives action instead of sitting unread.
Checkpoint logging is a small flag with an outsized payoff. It costs almost nothing, it applies without downtime, and it gives you the timeline you will desperately want the next time write latency climbs at 3 a.m. Flip it on, bake it into your IaC baseline, and let continuous scanning keep it that way.

