Fix API Gateway X-Ray Tracing Disabled on AWS

TL;DR

This check flags API Gateway REST API stages that have X-Ray tracing turned off, which leaves you blind to where latency and errors originate across your request path. Enable active tracing on the stage with one CLI call or a single Terraform attribute.

When a request hits your API Gateway endpoint and something goes wrong, the question is always the same: where did it break? Was it the gateway, the Lambda authorizer, the backend integration, or a downstream service three hops away? Without distributed tracing, you are left correlating timestamps across CloudWatch log groups and guessing. AWS X-Ray exists to remove that guesswork, and this Lensix check catches the stages where it has not been switched on.

What this check detects

The apigw_tracing check inspects every API Gateway REST API stage in your account and reports any stage where X-Ray active tracing is disabled. Tracing is a per-stage setting, so a single API can have it enabled on prod but missing on staging or a forgotten v1-legacy stage.

In API Gateway terms, the relevant property is tracingEnabled on the stage configuration. When it is false (the default), API Gateway does not emit trace segments to X-Ray, and you lose the gateway portion of any end-to-end trace.

Note: This check applies to REST APIs (the apigateway v1 service). HTTP APIs and WebSocket APIs handle observability differently, so the tracingEnabled stage attribute described here is specific to REST API stages.

Why it matters

X-Ray tracing is not a security control in the firewall sense, but missing observability has real operational and security consequences.

You cannot see the full request path

A typical REST API request fans out through several components: the gateway itself, a Lambda authorizer, the integration backend, and whatever that backend calls. CloudWatch logs show you each piece in isolation. X-Ray stitches them into one trace with a shared trace ID, so you can see that a 2-second response spent 1.8 seconds waiting on a DynamoDB call rather than in the gateway.

Slower incident response

During an outage, time spent figuring out which component failed is time the API stays down. Teams without tracing often spend the first 20 minutes of an incident just locating the broken link in the chain. With tracing, the service map shows the failing node immediately, with error rates highlighted.

Harder to investigate abuse and anomalies

Tracing helps you spot unusual patterns: a sudden spike in calls hitting an expensive backend path, a single client driving error rates, or authorizer latency that suggests a misbehaving token validation flow. These signals matter when you are triaging whether elevated traffic is a legitimate spike or an attack.

Compliance and audit gaps

Frameworks that require traceability of system behavior (and many internal audit programs) expect end-to-end visibility for production-facing APIs. A stage with no tracing is a stage you cannot fully account for after the fact.

Warning: X-Ray is not free. You are billed per trace recorded and per trace retrieved or scanned. For high-traffic APIs this adds up, so use sampling rules rather than tracing 100 percent of requests in very high volume environments. The default sampling rule (1 request per second plus 5 percent of additional requests) is a sensible starting point.

How to fix it

Enabling tracing is a stage-level change. Pick whichever path matches how you manage infrastructure.

Option 1: AWS CLI

Update the stage with a JSON patch that sets tracingEnabled to true:

aws apigateway update-stage \
  --rest-api-id abc123def4 \
  --stage-name prod \
  --patch-operations op=replace,path=/tracingEnabled,value=true

Confirm the change took effect:

aws apigateway get-stage \
  --rest-api-id abc123def4 \
  --stage-name prod \
  --query 'tracingEnabled'

A return value of true means the gateway will now emit segments to X-Ray.

Note: Updating tracingEnabled does not redeploy your API or change routing. It only flips the observability setting on the existing stage, so there is no downtime from this change itself.

Option 2: AWS Console

Open the API Gateway console and select your REST API.
In the left nav, choose Stages, then select the stage (for example prod).
Open the Logs and tracing tab and choose Edit.
Toggle X-Ray tracing on.
Save changes.

Option 3: Terraform

Set xray_tracing_enabled on the stage resource:

resource "aws_api_gateway_stage" "prod" {
  rest_api_id   = aws_api_gateway_rest_api.this.id
  deployment_id = aws_api_gateway_deployment.this.id
  stage_name    = "prod"

  xray_tracing_enabled = true
}

Then apply:

terraform plan -out tfplan
terraform apply tfplan

Option 4: CloudFormation / SAM

ProdStage:
  Type: AWS::ApiGateway::Stage
  Properties:
    RestApiId: !Ref MyRestApi
    DeploymentId: !Ref MyDeployment
    StageName: prod
    TracingEnabled: true

Don't forget IAM permissions for integrations

Enabling tracing on the gateway covers the gateway segment. To get a complete trace, the components behind it (such as Lambda functions) also need to send segments. For Lambda, enable active tracing on the function and attach the managed policy that allows it to write to X-Ray:

aws lambda update-function-configuration \
  --function-name my-api-backend \
  --tracing-config Mode=Active

aws iam attach-role-policy \
  --role-name my-lambda-execution-role \
  --policy-arn arn:aws:iam::aws:policy/AWSXRayDaemonWriteAccess

Tip: Tracing the gateway alone gives you a thin trace. The real payoff comes when the gateway, authorizer, and backend all report into the same trace ID. Enable active tracing across the whole request path so the X-Ray service map shows the full topology rather than a single disconnected node.

How to prevent it from happening again

Flipping the setting once is easy. Keeping it on across every new stage and every new API is the harder part. Bake it into your delivery pipeline so a stage without tracing never reaches production.

Policy-as-code with Terraform

If you use Sentinel or OPA/Conftest, write a rule that rejects any aws_api_gateway_stage with tracing disabled. A Conftest Rego policy looks like this:

package main

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_api_gateway_stage"
  resource.change.after.xray_tracing_enabled == false
  msg := sprintf("API Gateway stage '%s' must have X-Ray tracing enabled", [resource.address])
}

Run it against your plan output in CI:

terraform plan -out tfplan
terraform show -json tfplan > tfplan.json
conftest test tfplan.json

Service Control Policies and AWS Config

For organization-wide enforcement, use an AWS Config managed or custom rule to continuously evaluate stages and mark non-compliant ones. Pair it with an automatic remediation action that calls the same update-stage patch operation shown earlier, so drift gets corrected without human intervention.

Wire the Lensix check into your gates

Keep apigw_tracing running on a schedule and treat new findings as a signal that a stage shipped without tracing. Catching it in a scan a few minutes after deploy is good. Catching it in the plan stage before deploy is better.

Tip: Set tracing as a default in a shared Terraform module rather than per stage. If every team consumes a module that hardcodes xray_tracing_enabled = true, the secure setting becomes the path of least resistance and no one has to remember it.

Best practices

Trace the whole path, not just the gateway. Enable active tracing on Lambda functions, authorizers, and supported backends so traces connect end to end.
Use sampling rules deliberately. Trace everything in low-traffic environments where the cost is negligible, and apply sampling in high-volume production stages to control spend while keeping representative coverage.
Add annotations and metadata. In your application code, annotate traces with values like customer ID or route name so you can filter the service map and group traces meaningfully.
Combine tracing with access logging. X-Ray tells you where time went; access logs tell you who called and what they got back. Enable both on production stages for a complete picture.
Standardize across environments. Keep tracing on in staging too. The bugs you find with tracing in pre-production are the incidents you avoid in production.
Review the X-Ray service map regularly. Do not wait for an outage. Periodic review surfaces creeping latency and rising error rates before they become incidents.

X-Ray tracing on an API Gateway stage is a small toggle with outsized payoff. It costs almost nothing to enable, it does not change how your API behaves, and the first time you use it to pinpoint a slow downstream call in 30 seconds rather than 30 minutes, it pays for itself.

API Gateway X-Ray Tracing Disabled: Why It Matters and How to Fix It