Fix Azure VM Guest Diagnostics Disabled | Lensix

TL;DR

This check flags Azure VMs running without guest-level diagnostics, which means you lose visibility into in-guest metrics like memory usage, disk activity, and process-level data. Enable the Azure Monitor Agent or the legacy diagnostics extension on each VM to start collecting that telemetry.

Azure gives you a lot of telemetry for free. The hypervisor sees CPU usage, network throughput, and disk IOPS from outside the VM without any agent. That is host-level monitoring, and it works whether or not anything is installed inside the operating system. The problem is that the host cannot see inside the guest. It does not know how much memory the OS is actually using, which processes are eating CPU, or whether a disk is about to fill up.

The VM Guest Diagnostics Disabled check catches exactly this gap: virtual machines that have no guest-level diagnostics configured, leaving you blind to everything happening inside the operating system.

What this check detects

Lensix flags any Azure VM where guest-level diagnostics have not been enabled. In practice this means the VM has neither the Azure Monitor Agent (AMA) collecting guest performance counters nor the older Azure Diagnostics extension (often referred to as WAD on Windows or LAD on Linux) configured to ship in-guest data.

Without a guest agent, your monitoring is limited to whatever the Azure platform can observe from the host. That covers a useful but shallow set of metrics:

Host-visible: CPU percentage, network in/out, disk read/write bytes, disk operations per second.
Hidden without a guest agent: available memory, used memory, per-process CPU, logical disk free space, OS-level event logs, syslog, and custom application metrics.

Note: Memory is the classic example. Azure cannot report guest memory usage from the host, so memory-based autoscaling and alerting simply do not work until a guest agent is installed. Many teams discover this the hard way after an out-of-memory crash that left no trace in platform metrics.

Why it matters

Guest diagnostics is not just a nice-to-have dashboard. The missing telemetry creates real operational and security blind spots.

You cannot troubleshoot what you cannot see

When a VM starts misbehaving, the first questions are usually about memory pressure, disk space, and which process is responsible. None of those are answerable from host metrics alone. Teams without guest diagnostics end up SSHing or RDPing into a struggling machine to run top or Task Manager, which is exactly when the box is least responsive.

Capacity planning breaks down

Right-sizing a fleet requires knowing how much memory and disk each workload actually consumes. If you only have CPU and network from the host, you are guessing at half the picture. That leads to over-provisioned VMs that waste money or under-provisioned VMs that fall over under load.

Security and forensic gaps

Guest diagnostics is also a collection path for Windows Event Logs and Linux syslog. Without it, security-relevant events such as failed logins, privilege escalations, and service crashes never leave the box. If the VM is compromised and an attacker wipes local logs, you have no centralized copy to investigate. Incident response gets dramatically harder when the only evidence lived on the machine that got owned.

Warning: Several Azure security baselines and compliance frameworks (CIS Azure Foundations, Azure Security Benchmark) expect guest-level logging to be in place. A fleet of VMs with diagnostics disabled will show up as findings in those audits.

How to fix it

Microsoft is consolidating monitoring on the Azure Monitor Agent (AMA), which replaces the legacy Log Analytics agent and the classic diagnostics extension. For new work, prefer AMA with a Data Collection Rule (DCR). I will cover both the modern AMA path and the legacy extension path, since plenty of estates still run the latter.

Option 1: Azure Monitor Agent with a Data Collection Rule (recommended)

AMA does not collect anything until you associate it with a Data Collection Rule that says what to gather and where to send it. The flow is: create a Log Analytics workspace, create a DCR, install AMA on the VM, then associate the VM with the DCR.

First, make sure you have a workspace to receive the data:

az monitor log-analytics workspace create \
  --resource-group rg-monitoring \
  --workspace-name law-prod \
  --location eastus

Install the Azure Monitor Agent extension on the VM:

# Linux VM
az vm extension set \
  --resource-group rg-app \
  --vm-name vm-app-01 \
  --name AzureMonitorLinuxAgent \
  --publisher Microsoft.Azure.Monitor \
  --enable-auto-upgrade true

# Windows VM
az vm extension set \
  --resource-group rg-app \
  --vm-name vm-win-01 \
  --name AzureMonitorWindowsAgent \
  --publisher Microsoft.Azure.Monitor \
  --enable-auto-upgrade true

Then create a DCR that collects guest performance counters. The easiest way is through the portal (Monitor → Data Collection Rules → Create), but you can also do it declaratively. Here is the shape of a DCR that gathers common performance counters:

{
  "location": "eastus",
  "properties": {
    "dataSources": {
      "performanceCounters": [
        {
          "name": "perfCounters",
          "streams": ["Microsoft-Perf"],
          "samplingFrequencyInSeconds": 60,
          "counterSpecifiers": [
            "\\Processor(_Total)\\% Processor Time",
            "\\Memory\\Available Bytes",
            "\\Memory\\% Committed Bytes In Use",
            "\\LogicalDisk(_Total)\\% Free Space",
            "\\LogicalDisk(_Total)\\Disk Reads/sec",
            "\\LogicalDisk(_Total)\\Disk Writes/sec"
          ]
        }
      ]
    },
    "destinations": {
      "logAnalytics": [
        {
          "name": "laDest",
          "workspaceResourceId": "/subscriptions//resourceGroups/rg-monitoring/providers/Microsoft.OperationalInsights/workspaces/law-prod"
        }
      ]
    },
    "dataFlows": [
      {
        "streams": ["Microsoft-Perf"],
        "destinations": ["laDest"]
      }
    ]
  }
}

Finally, associate the VM with the DCR so the agent knows what rule to apply:

az monitor data-collection rule association create \
  --name "vm-app-01-dcr-assoc" \
  --rule-id "/subscriptions//resourceGroups/rg-monitoring/providers/Microsoft.Insights/dataCollectionRules/dcr-vm-perf" \
  --resource "/subscriptions//resourceGroups/rg-app/providers/Microsoft.Compute/virtualMachines/vm-app-01"

Tip: Associate a single DCR with many VMs instead of creating one per machine. Define the rule once, then attach every VM in a resource group or subscription to it. New VMs only need the agent installed and the association created, which is trivial to automate.

Option 2: Legacy Azure Diagnostics extension

If you are still on the classic diagnostics extension, you can enable guest metrics through the portal under the VM's Diagnostic settings blade, then toggle Enable guest-level monitoring. Azure will provision a storage account and install the extension. Via CLI, the legacy extension is configured with a diagnostics config file:

az vm diagnostics set \
  --resource-group rg-app \
  --vm-name vm-win-01 \
  --settings @diagnostics-config.json

Note: The legacy Log Analytics agent (also called the MMA/OMS agent) is retired as of August 2024. If your remediation plan still leans on it, switch to AMA. The classic diagnostics extension is still supported but should be considered the older path.

Verify the agent is reporting

After installation, confirm the extension provisioned successfully:

az vm extension list \
  --resource-group rg-app \
  --vm-name vm-app-01 \
  --query "[].{Name:name, State:provisioningState}" \
  --output table

Then run a quick KQL query in your Log Analytics workspace to confirm data is landing:

Perf
| where Computer == "vm-app-01"
| summarize count() by CounterName
| order by CounterName asc

How to prevent it from happening again

Fixing VMs one at a time does not scale, and new VMs will keep arriving without diagnostics unless you enforce it. There are two durable approaches: policy and infrastructure as code.

Azure Policy with auto-remediation

Azure ships built-in policy definitions and initiatives that deploy the Azure Monitor Agent and DCR associations automatically. Assign the relevant initiative with a DeployIfNotExists effect, and any VM that lacks the agent gets one provisioned by a remediation task.

az policy assignment create \
  --name "deploy-ama-linux" \
  --display-name "Deploy Azure Monitor Agent to Linux VMs" \
  --policy "" \
  --scope "/subscriptions/" \
  --location eastus \
  --mi-system-assigned \
  --role Contributor

Warning: DeployIfNotExists policies need a managed identity with permission to install extensions and create associations. If the assignment lacks the right role, remediation tasks silently fail. Grant the identity Contributor (or a tighter custom role) on the target scope.

Bake it into Terraform

If you provision VMs with Terraform, attach the agent and DCR association in the same module that creates the VM. That way no machine ever ships without diagnostics:

resource "azurerm_virtual_machine_extension" "ama" {
  name                       = "AzureMonitorLinuxAgent"
  virtual_machine_id         = azurerm_linux_virtual_machine.app.id
  publisher                  = "Microsoft.Azure.Monitor"
  type                       = "AzureMonitorLinuxAgent"
  type_handler_version       = "1.0"
  auto_upgrade_minor_version = true
}

resource "azurerm_monitor_data_collection_rule_association" "app" {
  name                    = "vm-app-dcr-assoc"
  target_resource_id      = azurerm_linux_virtual_machine.app.id
  data_collection_rule_id = azurerm_monitor_data_collection_rule.perf.id
}

Gate it in CI/CD

Run a policy compliance check or a tfsec/checkov scan in your pipeline so a pull request that creates a VM without a diagnostics association fails review before it ever reaches production. Combined with Lensix continuous scanning, you catch drift both before deploy and after.

Best practices

Standardize on the Azure Monitor Agent. The legacy agents are retired or on their way out. New deployments should use AMA with DCRs so you are not migrating again in a year.
Use a small number of shared DCRs. Group rules by workload type (web tier, database tier, domain controllers) rather than per VM. Fewer rules means less to maintain and easier auditing.
Collect logs, not just metrics. Performance counters tell you the VM is unhealthy; event logs and syslog tell you why. Configure your DCR to gather security and system event logs too.
Set actionable alerts. Guest metrics are only useful if someone acts on them. Create alert rules for low memory, low disk space, and high CPU so problems surface before users notice.
Mind the cost. Log Analytics bills on ingestion volume. Sample performance counters at sensible intervals (60 seconds is usually fine) and avoid collecting noisy verbose logs you will never query.

Tip: Set a daily ingestion cap on your Log Analytics workspace as a safety net. It protects you from a runaway log source blowing up your monitoring bill while you tune your collection rules.

Guest diagnostics is one of those settings that costs nothing in effort once it is automated but pays off every time you have an incident. Turn it on, enforce it with policy, and you will never again be left guessing about what is happening inside a VM you cannot reach.

VM Guest Diagnostics Disabled on Azure: Why It Matters and How to Fix It