Fix Azure Load Balancer Missing Diagnostic Settings

TL;DR

This check flags Azure Load Balancers that have no diagnostic settings, which means health and traffic metrics are not being streamed anywhere. Without them you lose visibility into failed probes, dropped packets, and SNAT exhaustion. Fix it by sending diagnostic logs to a Log Analytics workspace, storage account, or event hub.

An Azure Load Balancer sits in front of your most important traffic, distributing connections across backend pools and quietly making decisions about which instances are healthy. When something goes wrong, the load balancer is often the first place to look. But if it has no diagnostic settings configured, there is nothing to look at. The data simply never leaves the resource.

This Lensix check, monitor_lbnodiagnostics, catches exactly that situation: a Load Balancer with no diagnostic settings sending logs or metrics to any destination.

What this check detects

The check inspects each Azure Load Balancer in your subscriptions and verifies whether it has at least one diagnostic setting attached. A diagnostic setting tells Azure where to route the resource's platform logs and metrics. Without one, the load balancer still works, but its operational telemetry is discarded after the short default retention window in Azure Monitor.

Specifically, a Standard SKU Load Balancer can emit:

Metrics such as data path availability, health probe status, SNAT connection count, and byte/packet counts.
Resource logs like LoadBalancerHealthEvent, which records availability and configuration issues affecting the load balancer.

When no diagnostic setting exists, none of this is archived, queryable, or alertable beyond the in-platform metric retention.

Note: Diagnostic logs for Load Balancers require the Standard SKU. Basic SKU load balancers expose a more limited metric set and do not support the same resource log categories. Basic SKU is also retiring, so if you are still on it, plan a migration to Standard.

Why it matters

A load balancer with no telemetry is a blind spot in front of your application. The risk is rarely a direct breach. It is the slow, expensive kind of failure where an incident drags on because nobody can see what happened.

You cannot diagnose what you cannot see

Picture a backend pool where instances start failing health probes during a deployment. Users get intermittent 502s. Without health probe metrics flowing to Log Analytics, your on-call engineer is reduced to guessing whether the issue is the app, the network, or the load balancer config. With diagnostics in place, a single query shows the probe failures and the exact moment they began.

SNAT port exhaustion goes unnoticed

One of the most common and most painful Azure networking failures is SNAT port exhaustion. When backend instances make many outbound connections through the load balancer, they can run out of available source ports, and new outbound connections start timing out. The SNAT Connection Count and Used SNAT Ports metrics are the only reliable early-warning signal. No diagnostics means you find out when production is already on fire.

Warning: SNAT exhaustion often surfaces as sporadic outbound connection failures that look like upstream API flakiness. Teams routinely spend days chasing the wrong cause before checking SNAT metrics, which would have shown the problem immediately.

Compliance and audit gaps

Frameworks like CIS Azure, SOC 2, and ISO 27001 expect that network infrastructure produces auditable logs. A load balancer with no diagnostic settings is a finding waiting to happen during an audit, and it undermines incident forensics if you ever need to reconstruct a timeline.

How to fix it

Fixing this means attaching a diagnostic setting that routes logs and metrics to a destination. You have three destination options, and they are not mutually exclusive:

Log Analytics workspace for querying with KQL and building alerts. This is the recommended primary destination.
Storage account for cheap long-term archival.
Event Hub for streaming to a SIEM or third-party tool.

Option 1: Azure Portal

Open the Load Balancer resource in the Azure Portal.
Under Monitoring, select Diagnostic settings.
Click Add diagnostic setting.
Give it a name, then check the log categories (LoadBalancerHealthEvent) and AllMetrics.
Choose a destination, for example Send to Log Analytics workspace, and pick your workspace.
Click Save.

Option 2: Azure CLI

First grab the resource ID of the load balancer and your Log Analytics workspace, then create the diagnostic setting.

LB_ID=$(az network lb show \
  --resource-group my-rg \
  --name my-loadbalancer \
  --query id -o tsv)

WORKSPACE_ID=$(az monitor log-analytics workspace show \
  --resource-group my-rg \
  --workspace-name my-law \
  --query id -o tsv)

az monitor diagnostic-settings create \
  --name lb-diagnostics \
  --resource "$LB_ID" \
  --workspace "$WORKSPACE_ID" \
  --logs '[{"category":"LoadBalancerHealthEvent","enabled":true}]' \
  --metrics '[{"category":"AllMetrics","enabled":true}]'

Tip: If you manage many load balancers, list them all and loop over the results so you never miss one:

az network lb list --query "[].id" -o tsv | while read LB_ID; do
  az monitor diagnostic-settings create \
    --name lb-diagnostics \
    --resource "$LB_ID" \
    --workspace "$WORKSPACE_ID" \
    --logs '[{"category":"LoadBalancerHealthEvent","enabled":true}]' \
    --metrics '[{"category":"AllMetrics","enabled":true}]'
done

Option 3: Terraform

If your infrastructure is defined in code, add the diagnostic setting alongside the load balancer so it can never drift away.

resource "azurerm_monitor_diagnostic_setting" "lb" {
  name                       = "lb-diagnostics"
  target_resource_id         = azurerm_lb.example.id
  log_analytics_workspace_id = azurerm_log_analytics_workspace.example.id

  enabled_log {
    category = "LoadBalancerHealthEvent"
  }

  metric {
    category = "AllMetrics"
    enabled  = true
  }
}

Warning: Diagnostic logs and metrics have an ingestion and retention cost in Log Analytics and storage. For high-traffic load balancers the metric volume is modest, but enabling diagnostics across hundreds of resources adds up. Set retention deliberately and consider sending verbose data to cheaper storage rather than Log Analytics.

How to prevent it from happening again

Remediating one load balancer is easy. Keeping every load balancer covered as your estate grows is the real challenge. Two approaches handle this well.

Azure Policy with DeployIfNotExists

Azure ships a built-in policy that automatically configures diagnostic settings for Load Balancers to a Log Analytics workspace. Assigning it means any new load balancer gets diagnostics deployed without manual action.

# Find the built-in policy definition
az policy definition list \
  --query "[?contains(displayName, 'Configure diagnostic settings') && contains(displayName, 'Load Balancer')].{name:name, displayName:displayName}" \
  -o table

Assign the matching DeployIfNotExists definition at the subscription or management group scope, supplying your workspace as a parameter. Then run a remediation task to bring existing resources into compliance.

Note: DeployIfNotExists policies need a managed identity with permission to write diagnostic settings and read the target workspace. Azure Policy can create this identity for you during assignment, but the identity must have the right role assignments or remediation will silently fail.

Catch it in CI/CD

If you provision infrastructure with Terraform or Bicep, fail the pipeline when a load balancer is defined without a corresponding diagnostic setting. A policy-as-code tool like Checkov, OPA/Conftest, or tfsec can enforce this before anything reaches the cloud.

# Example: run Checkov against a Terraform plan in CI
checkov -d ./infra --framework terraform

Pair that with a scheduled Lensix scan so anything created outside your pipeline, for example a one-off load balancer spun up in the portal during an incident, still gets flagged.

Best practices

Standardize on a single Log Analytics workspace per environment. Centralizing telemetry makes cross-resource queries and correlation far easier during an incident.
Always include AllMetrics. The health probe and SNAT metrics are where most operational problems show up, so do not enable logs while skipping metrics.
Alert on the signals that matter. Configure metric alerts for data path availability dropping below 100 percent and for SNAT port usage approaching its limit.
Split destinations by purpose. Send queryable data to Log Analytics for active monitoring and archive raw data to a storage account for cheap long-term retention.
Migrate off Basic SKU. Basic Load Balancers do not support the full diagnostic feature set and are being retired. Move to Standard SKU to unlock proper observability.
Treat diagnostics as part of the resource, not an afterthought. Define them in the same Terraform module or Bicep template as the load balancer itself so they are never forgotten or removed.

A load balancer without diagnostics is not broken, it is just invisible. The day you need to understand a production incident is the wrong day to discover the data was never being collected.

Configure diagnostic settings once, enforce them with policy, and the next time a backend pool starts misbehaving you will have answers instead of guesses.

Azure Load Balancer Has No Diagnostic Settings: Why It Matters and How to Fix It