Back to blog
AWSBest PracticesCloud SecurityMonitoring & LoggingServerless

Lambda Function X-Ray Tracing Disabled: Why It Matters and How to Fix It

Learn why AWS Lambda functions need X-Ray active tracing, the risks of leaving it off, and how to fix and enforce it with CLI, Terraform, and policy-as-code.

TL;DR

This check flags Lambda functions running without AWS X-Ray active tracing, which leaves you blind to where time and errors are spent across your serverless calls. Turn on tracing with one tracing-config flag (or one line of Terraform) so you can actually debug latency and failures in production.

When a Lambda function misbehaves in production, the first question is always the same: where did the time go, and where did the request break? Without distributed tracing, you are left squinting at CloudWatch Logs, manually correlating timestamps across functions, queues, and downstream APIs. AWS X-Ray solves this by recording the path a request takes through your function and the services it touches. The lambda_notracing check catches functions where this visibility is simply switched off.

Note: X-Ray has two modes for Lambda. PassThrough only forwards an existing trace header if one already exists upstream. Active tracing means Lambda samples and records traces for invocations even when no upstream trace exists. This check expects Active tracing.


What this check detects

The check inspects each Lambda function's TracingConfig and reports a finding when the mode is set to PassThrough (the default) instead of Active. In other words, the function is not generating its own X-Ray segments.

You can see the current setting for any function with the AWS CLI:

aws lambda get-function-configuration \
  --function-name my-function \
  --query 'TracingConfig.Mode' \
  --output text

A return value of PassThrough means the function will be flagged. Active means it passes.


Why it matters

Serverless architectures are rarely a single function. A typical request fans out through API Gateway, one or more Lambda functions, DynamoDB, SQS, and a handful of third-party calls. When something is slow or failing, the cause is usually buried in one of those hops. Without active tracing, you cannot tell whether the 4-second latency came from a cold start, a slow DynamoDB query, or a downstream API that is rate-limiting you.

Here is what disabled tracing costs you in practice:

  • Slower incident response. During an outage, engineers burn time grepping logs and guessing instead of reading a service map that shows the broken edge in red.
  • Hidden performance regressions. A deploy that adds 200ms per call to a downstream service can sit unnoticed for weeks without per-segment timing data.
  • No visibility into cold starts. X-Ray records initialization time as a distinct subsegment, which is often the single biggest contributor to p99 latency.
  • Weak audit and compliance posture. Frameworks like SOC 2 and the AWS Well-Architected Framework expect observability over critical workloads. A blanket "tracing off" stance is hard to defend in a review.

Warning: X-Ray is not free. You get 100,000 traces recorded per month and 1,000,000 traces retrieved or scanned per month at no cost, then you pay per trace. High-throughput functions can generate real spend, so use sampling rules rather than tracing every single invocation at full volume.


How to fix it

Enabling active tracing takes two things: the tracing config on the function, and an IAM permission so Lambda can write to X-Ray.

Step 1: Grant the X-Ray permissions

The function's execution role needs to be able to push trace data. AWS provides a managed policy for exactly this:

aws iam attach-role-policy \
  --role-name my-function-execution-role \
  --policy-arn arn:aws:iam::aws:policy/AWSXRayDaemonWriteAccess

If you prefer a least-privilege inline policy instead of the managed one, attach this:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "xray:PutTraceSegments",
        "xray:PutTelemetryRecords"
      ],
      "Resource": "*"
    }
  ]
}

Step 2: Enable active tracing on the function

aws lambda update-function-configuration \
  --function-name my-function \
  --tracing-config Mode=Active

In the console, the same change lives under Configuration → Monitoring and operations tools → Edit → Active tracing.

Step 3: Fix it as infrastructure-as-code

The console and CLI are fine for a one-off, but the durable fix is in your IaC so the setting survives the next deploy.

Terraform:

resource "aws_lambda_function" "my_function" {
  function_name = "my-function"
  role          = aws_iam_role.lambda_exec.arn
  handler       = "index.handler"
  runtime       = "nodejs20.x"

  tracing_config {
    mode = "Active"
  }
}

resource "aws_iam_role_policy_attachment" "xray" {
  role       = aws_iam_role.lambda_exec.name
  policy_arn = "arn:aws:iam::aws:policy/AWSXRayDaemonWriteAccess"
}

AWS SAM:

Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.handler
      Runtime: nodejs20.x
      Tracing: Active
      Policies:
        - AWSXRayDaemonWriteAccess

Tip: In SAM you can set Tracing: Active once under Globals and it applies to every function in the template, so you never have to remember it per function.

Step 4: Instrument your code (optional but worth it)

Active tracing alone gives you the function-level segment and AWS SDK call subsegments automatically. To trace custom logic and downstream HTTP calls, wrap your clients with the X-Ray SDK or, better, use the AWS Distro for OpenTelemetry (ADOT) Lambda layer.

const AWSXRay = require('aws-xray-sdk-core');
const AWS = AWSXRay.captureAWS(require('aws-sdk'));

// Every DynamoDB, S3, SQS call now shows up as a subsegment
const dynamo = new AWS.DynamoDB.DocumentClient();

How to prevent it from happening again

Fixing one function is easy. Keeping every new function compliant is the real work. Push the control as far left as you can.

Catch it in CI with policy-as-code

If you deploy with Terraform, a Checkov scan in your pipeline will fail any plan that ships a function without active tracing. Checkov ships this rule out of the box (CKV_AWS_50):

checkov -d . --check CKV_AWS_50

For a custom OPA/Conftest gate, you can write a rule against the Terraform plan JSON:

package main

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_lambda_function"
  resource.change.after.tracing_config[_].mode != "Active"
  msg := sprintf("Lambda %s must enable Active X-Ray tracing", [resource.address])
}

Enforce it organization-wide

For accounts where you cannot review every pipeline, a scheduled Lensix scan plus an EventBridge rule that auto-remediates new functions closes the gap. Use Lensix to continuously surface lambda_notracing findings across all regions and accounts so a function created by hand in the console does not slip past your IaC controls.

Tip: Make active tracing part of your shared Lambda module or Serverless Framework template. When the default is correct, nobody has to remember to opt in, and the check stops appearing in your findings.


Best practices

  • Trace your critical paths, sample the rest. Use X-Ray sampling rules to record every error and a percentage of healthy traffic. This keeps cost predictable while still capturing the failures you care about.
  • Prefer ADOT over the raw X-Ray SDK for new work. OpenTelemetry gives you a vendor-neutral instrumentation layer, so you are not locked in if you later add Datadog, Grafana Tempo, or another backend.
  • Trace the full request, not just Lambda. Enable tracing on API Gateway and propagate trace headers through SQS and SNS so your service map is complete end to end.
  • Add annotations for the things you search on. Tag traces with customer IDs, request types, or feature flags so you can filter the X-Ray console to the exact slice of traffic during an incident.
  • Set retention and alerting on trace data. Pair X-Ray with CloudWatch alarms on latency and error rate so tracing is your investigation tool, not your detection tool.

Active tracing is one of the cheapest observability wins in a serverless stack. It is a single configuration flag plus one IAM permission, and the payoff is the difference between reading a service map during an incident and guessing in the dark. Enable it everywhere, enforce it in CI, and let Lensix flag the functions that slip through.