Fix Azure PostgreSQL Checkpoint Logging Disabled

TL;DR

This check flags Azure Database for PostgreSQL servers where the log_checkpoints parameter is off, leaving you blind to checkpoint activity that helps diagnose I/O spikes and tune performance. Turn it on by setting log_checkpoints to on via the Azure CLI or portal.

Checkpoints are one of those background database operations that nobody thinks about until they cause a problem. When PostgreSQL flushes dirty buffers to disk, it can generate sudden bursts of I/O that show up as latency spikes, slow queries, or connection timeouts. If log_checkpoints is disabled, you have no record of when those checkpoints ran or how heavy they were, which makes diagnosing the spike a guessing game.

The Lensix check postgresql_nologcheckpoints looks at each Azure Database for PostgreSQL server and reports any that have checkpoint logging turned off.

What this check detects

Azure Database for PostgreSQL exposes a server parameter called log_checkpoints. When set to on, PostgreSQL writes a log line every time a checkpoint completes, including how many buffers were written, how long the write and sync phases took, and whether the checkpoint was triggered by time or by WAL volume.

This check fails when log_checkpoints is set to off on a Flexible Server or Single Server instance. The check covers both deployment models, though Single Server is on a retirement path and you should plan a move to Flexible Server regardless.

Note: On many self-managed PostgreSQL 15+ installs, log_checkpoints defaults to on. Azure's managed offerings have historically shipped it as off on older versions and server parameter groups, which is exactly why this check exists.

Why it matters

Checkpoint logging is not a security control in the firewall sense, but it is a core part of the observability and incident response story for any production database. Here is what you lose without it.

You cannot correlate I/O spikes with checkpoints

A common production pattern: every few minutes your application sees a wave of slow queries, then everything recovers. Without checkpoint logs you might chase the application, the network, or the query planner for hours. With them, a single log line tells you a checkpoint flushed several hundred thousand buffers in that exact window, and now you know to tune max_wal_size or checkpoint_completion_target instead.

Frequent checkpoints signal misconfiguration

If checkpoints are firing far more often than your checkpoint_timeout would suggest, it usually means WAL volume is hitting max_wal_size too quickly. That is a sign of write-heavy load that needs tuning. The log line that says checkpoint starting: wal versus checkpoint starting: time is how you tell the difference, and you only get that line when logging is on.

Compliance and audit gaps

Frameworks like CIS Benchmarks for Azure and internal audit requirements often expect database servers to have a defined logging posture. An empty checkpoint log is a finding waiting to be written up, and worse, it leaves you unable to reconstruct what happened during a past incident.

Warning: Checkpoint logging is low volume, typically a handful of lines per hour, so it has negligible impact on log storage cost. This is not a verbose setting like log_statement = all. There is no good reason to leave it off.

How to fix it

The fix is a single server parameter change. It does not require a restart on Flexible Server, since log_checkpoints is a dynamic parameter, so you can apply it with no downtime.

Option 1: Azure CLI (Flexible Server)

az postgres flexible-server parameter set \
  --resource-group my-resource-group \
  --server-name my-pg-server \
  --name log_checkpoints \
  --value on

Verify the change took effect:

az postgres flexible-server parameter show \
  --resource-group my-resource-group \
  --server-name my-pg-server \
  --name log_checkpoints \
  --query "value" -o tsv

Option 2: Azure CLI (Single Server)

az postgres server configuration set \
  --resource-group my-resource-group \
  --server-name my-pg-server \
  --name log_checkpoints \
  --value on

Option 3: Azure Portal

Open your PostgreSQL server in the Azure Portal.
Under Settings, select Server parameters.
Search for log_checkpoints.
Set the value to ON.
Click Save.

Note: On Flexible Server the change applies immediately. On older Single Server instances some static parameters require a restart, but log_checkpoints is dynamic and does not.

Option 4: Terraform

If you manage your database with the AzureRM provider, define the parameter as code so it stays enforced:

resource "azurerm_postgresql_flexible_server_configuration" "log_checkpoints" {
  name      = "log_checkpoints"
  server_id = azurerm_postgresql_flexible_server.main.id
  value     = "on"
}

For Single Server, use the equivalent configuration resource:

resource "azurerm_postgresql_configuration" "log_checkpoints" {
  name                = "log_checkpoints"
  resource_group_name = azurerm_resource_group.main.name
  server_name         = azurerm_postgresql_server.main.name
  value               = "on"
}

Tip: While you are in the parameters, consider enabling log_connections, log_disconnections, and log_lock_waits at the same time. They round out your observability and are also flagged by other Lensix PostgreSQL checks.

How to prevent it from happening again

Fixing one server by hand is fine. Stopping the next one from drifting back to off is where the real value is.

Define it in your IaC modules

If you provision PostgreSQL through a shared Terraform or Bicep module, bake the parameter into the module itself so every new server inherits it. Here is a Bicep example:

resource logCheckpoints 'Microsoft.DBforPostgreSQL/flexibleServers/configurations@2023-03-01-preview' = {
  parent: postgresServer
  name: 'log_checkpoints'
  properties: {
    value: 'on'
    source: 'user-override'
  }
}

Gate it with policy-as-code

Azure Policy can audit or deny servers that do not have the parameter set. Use a built-in or custom policy that checks the configuration value and reports non-compliant servers in your governance dashboard. A deny effect blocks creation outright, while audit gives you a softer rollout.

You can also enforce this in CI with a tool like Checkov or tfsec scanning your Terraform before it merges:

checkov -d ./infra --framework terraform

Continuous scanning with Lensix

Manual reviews and pre-merge gates catch a lot, but parameters get changed through the portal, through support tickets, and during incidents. A continuous scan against your live environment is what catches that drift. Lensix runs postgresql_nologcheckpoints on every scan and surfaces any server that slipped back to off, so you find out before an auditor or an outage does.

Best practices

Treat logging parameters as a baseline, not an afterthought. Enable log_checkpoints, log_connections, and log_lock_waits as part of your standard server build.
Ship logs somewhere durable. Enabling the parameter only writes to the server log. Configure diagnostic settings to forward PostgreSQL logs to a Log Analytics workspace or a storage account so the data outlives the server.
Alert on checkpoint frequency. Once logs land in Log Analytics, build a query that counts checkpoint starting: wal events and alerts when they exceed your tuned baseline. That turns a passive log into an early warning.
Tune alongside logging. If logs reveal frequent WAL-triggered checkpoints, raise max_wal_size and set checkpoint_completion_target closer to 0.9 to spread the write load.
Migrate off Single Server. Azure Database for PostgreSQL Single Server is retiring. Flexible Server gives you better parameter control, dynamic changes without restarts, and a longer support runway.

Tip: Build a Kusto query in Log Analytics that graphs checkpoint counts per hour against your application latency metrics. Seeing the two lines overlap makes the case for tuning far more convincing than a stack of raw log lines.

Checkpoint logging costs you almost nothing and buys you a clear window into one of the most common sources of mysterious database latency. Flip it on, enforce it in code, and let your scanner keep it that way.

PostgreSQL Checkpoint Logging Disabled on Azure: Why It Matters and How to Fix It