Fix API Gateway Detailed Metrics Disabled on AWS

TL;DR

This check flags API Gateway stages that don't have detailed CloudWatch metrics enabled, which leaves you blind to per-method latency, error rates, and traffic patterns. Turn on detailed metrics with a single update-stage call or a one-line setting in your IaC.

When an API Gateway stage misbehaves at 2am, the difference between a five minute fix and a two hour incident usually comes down to one thing: whether you can see which method is failing. Without detailed metrics, all you get is a coarse, stage-wide view that hides the method actually causing the pain. This check catches that gap before it bites you during an outage.

What this check detects

The apigw_nometrics check looks at each deployed API Gateway stage and verifies whether detailed CloudWatch metrics are enabled. In AWS terms, this is the metricsEnabled setting on a stage's method settings.

API Gateway emits two tiers of metrics:

Basic metrics (always on): aggregated at the stage level. You see total Count, 4XXError, 5XXError, Latency, and IntegrationLatency across the entire stage.
Detailed metrics (off by default): the same metrics broken down per method and resource path, so you can isolate GET /orders from POST /payments.

When detailed metrics are disabled, you can confirm an API is throwing 500s but you cannot tell from CloudWatch which endpoint is responsible. The check fails for any stage where this finer-grained visibility is missing.

Note: This setting can be configured globally for a stage (all methods) or per individual method using the */* wildcard versus a specific {resource}/{httpMethod} path. The check evaluates the effective configuration applied to your methods.

Why it matters

API Gateway sits at the front door of your services. It's frequently the first AWS component a request touches and the last one a response leaves through. Losing observability here means losing it everywhere downstream.

Slow incident response

Stage-level metrics tell you something is broken but not where. Imagine a stage serving forty methods. A spike in 5XXError shows up, but the aggregate metric blends every route together. Your on-call engineer ends up grepping logs or guessing instead of pulling up a per-method graph and immediately seeing that POST /checkout is the only failing endpoint.

Missed performance regressions

A single slow method can drag up the aggregate latency just enough to look like noise. With detailed metrics, a regression on one endpoint is obvious. Without them, gradual degradation hides inside the average until customers complain.

Weaker security signal

Per-method error rates are a useful early signal for abuse. A sudden burst of 4XXError concentrated on an auth endpoint can indicate credential stuffing or enumeration attempts. Aggregated metrics smear that signal across all traffic and make it far easier to miss.

Warning: Detailed metrics are not free. Each method generates additional CloudWatch custom metrics, and CloudWatch bills per metric per month. On an API with hundreds of methods across multiple stages, this adds up. Enable it where visibility matters most, and read the cost note further down before flipping it on everywhere.

How to fix it

You can enable detailed metrics through the console, the CLI, or infrastructure as code. The IaC route is the only one that sticks, so treat the console and CLI options as quick fixes during an incident.

Option 1: AWS Console

Open the API Gateway console and select your API.
In the left nav, choose Stages and select the target stage.
Open the Logs and tracing section and choose Edit.
Toggle Detailed metrics on.
Save changes. CloudWatch begins emitting per-method metrics within a few minutes.

Option 2: AWS CLI

For a REST API, patch the stage's method settings. The /*/* path applies the setting to every method on the stage.

aws apigateway update-stage \
  --rest-api-id abc123def4 \
  --stage-name prod \
  --patch-operations \
    op=replace,path=/*/*/metrics/enabled,value=true

Verify it took effect:

aws apigateway get-stage \
  --rest-api-id abc123def4 \
  --stage-name prod \
  --query 'methodSettings."*/*".metricsEnabled'

For HTTP APIs (API Gateway v2), detailed metrics work differently. v2 emits metrics at the route level through the DetailedMetricsEnabled route setting:

aws apigatewayv2 update-stage \
  --api-id abc123def4 \
  --stage-name prod \
  --default-route-settings DetailedMetricsEnabled=true

Note: The update-stage call only changes configuration, not behavior, so it is safe to run against production. There is no redeploy and no request disruption. Metrics simply start flowing.

Option 3: Terraform

For a REST API, set metrics_enabled on the method settings resource:

resource "aws_api_gateway_method_settings" "prod" {
  rest_api_id = aws_api_gateway_rest_api.example.id
  stage_name  = aws_api_gateway_stage.prod.stage_name
  method_path = "*/*"

  settings {
    metrics_enabled    = true
    logging_level      = "INFO"
    data_trace_enabled = false
  }
}

For an HTTP API stage:

resource "aws_apigatewayv2_stage" "prod" {
  api_id      = aws_apigatewayv2_api.example.id
  name        = "prod"
  auto_deploy = true

  default_route_settings {
    detailed_metrics_enabled = true
  }
}

Option 4: CloudFormation

ProdStage:
  Type: AWS::ApiGateway::Stage
  Properties:
    RestApiId: !Ref MyRestApi
    StageName: prod
    DeploymentId: !Ref MyDeployment
    MethodSettings:
      - ResourcePath: "/*"
        HttpMethod: "*"
        MetricsEnabled: true

Tip: Pair detailed metrics with execution logging at the INFO level on non-production stages and ERROR on production. Metrics tell you which method is failing, and logs tell you why. Together they cut mean time to resolution far more than either does alone.

How to prevent it from happening again

Manually toggling a setting fixes one stage today. To stop it drifting back tomorrow, push the requirement into the pipeline.

Enforce it in module defaults

If teams provision API Gateways through a shared Terraform module, bake metrics_enabled = true into the module so every new API inherits it. Make the value a variable that defaults to true so opting out is a deliberate, reviewable choice rather than the silent default.

Gate it with policy as code

Add an OPA or Conftest policy to your CI that rejects any plan creating a stage without metrics. A rough Rego check against a Terraform plan looks like this:

package apigateway

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_api_gateway_method_settings"
  not resource.change.after.settings[_].metrics_enabled
  msg := sprintf("API Gateway method settings %s must enable detailed metrics", [resource.address])
}

Run it as a required step before terraform apply:

terraform plan -out=tfplan
terraform show -json tfplan > plan.json
conftest test plan.json --policy ./policies

Catch drift continuously

Policy gates only cover changes that go through your pipeline. Someone with console access can still disable metrics by hand. Continuous scanning closes that gap. Lensix runs the apigw_nometrics check on a schedule across every account and region, so a manual change surfaces as a finding instead of waiting for the next incident to reveal it.

Tip: Wire your scan findings into the same alerting channel your on-call team already watches. A config drift finding that lands in Slack next to your other alerts gets fixed in hours. One buried in a weekly report gets fixed never.

Best practices

Enable detailed metrics on every production stage. The visibility is worth the cost where uptime matters. Treat it as a baseline, not an upgrade.
Build dashboards per method, not per stage. Once detailed metrics flow, create CloudWatch dashboards that graph latency and error rate for your highest-traffic and highest-risk endpoints.
Alarm on per-method 5XX rates. Set CloudWatch alarms on the 5XXError metric scoped to critical methods so you are paged before the aggregate budget is blown.
Be deliberate about cost on wide APIs. If an API has hundreds of methods, consider enabling detailed metrics per critical method instead of the */* wildcard to control the number of custom metrics CloudWatch bills you for.
Combine metrics, access logs, and X-Ray. Metrics show the trend, access logs show individual requests, and X-Ray traces show the path through your integrations. The three together give you a complete picture.

Note: Detailed metrics are a monitoring control, not a security boundary. They make problems visible faster, but they do not prevent abuse. Keep them alongside throttling, WAF rules, and authorizers as part of a layered API defense.

Observability is cheapest to add before you need it and most expensive to be missing during an outage. Enabling detailed metrics is a small, low-risk change that pays off the first time something breaks at the worst possible hour.

API Gateway Detailed Metrics Disabled: Why It Matters and How to Fix It