Fix ECS Container Insights Disabled on AWS | Lensix

TL;DR

This check flags ECS clusters running without Container Insights, which means you have no per-container CPU, memory, or task-level metrics in CloudWatch. Turn it on with a single update-cluster-settings call or by setting the cluster default so every new cluster inherits it.

When an ECS task starts behaving badly, the first question is always the same: which container, which task, and what resource is it starving for? Without Container Insights enabled, you are stuck stitching together fragments from the EC2 host metrics, application logs, and guesswork. Lensix raises ECS Container Insights Disabled when it finds a cluster that has not opted into this observability layer, leaving a blind spot that tends to surface at the worst possible moment.

What this check detects

The ecs_nocontainerinsight check inspects each ECS cluster in your account and reads its settings field. If the containerInsights setting is missing or set to disabled, the cluster is flagged.

Container Insights is an opt-in CloudWatch feature for ECS and EKS. When enabled on an ECS cluster, it collects and aggregates metrics at the cluster, service, and task level, then publishes them to a dedicated CloudWatch namespace and a set of automatic dashboards. The data includes:

CPU and memory utilization per service and per task
Network rx/tx bytes
Storage read/write metrics
Task and container counts, including pending and running states
Deployment and scaling events overlaid on the metric timelines

Note: Standard ECS metrics published to CloudWatch only give you CPU and memory at the cluster and service level. Container Insights adds the per-task granularity and the performance log events that make debugging individual containers possible.

Why it matters

This is not a direct security vulnerability in the sense that no attacker exploits a disabled metric. The risk is operational, and it compounds quietly until an incident forces it into the open.

You cannot debug what you cannot see

Imagine a service that intermittently returns 502 errors. With Container Insights off, you see elevated CPU at the cluster level but no way to tell whether one task is pinned at 100 percent while the rest sit idle. You burn an hour SSHing into hosts and running docker stats by hand. With Container Insights on, you open the dashboard, sort tasks by CPU, and find the offender in under a minute.

Autoscaling decisions degrade

Service autoscaling and capacity provider scaling lean on accurate utilization data. Coarse, host-level metrics can mask a task that is memory-bound, leading to either OOM kills that look like random crashes or over-provisioned capacity that quietly inflates your bill.

Incident response slows down

During a live incident, the gap between "something is wrong" and "this specific task on this specific service is the cause" is where minutes turn into outages. Per-task metrics and the performance event log shorten that gap dramatically.

Compliance and audit posture

Frameworks like SOC 2 and ISO 27001 expect demonstrable monitoring of production workloads. A cluster with no task-level observability is hard to defend in an audit when the auditor asks how you detect performance degradation.

Warning: Container Insights is not free. It publishes custom metrics and ingests performance log events, both billed by CloudWatch. For a large cluster with hundreds of tasks this can add up. Estimate the cost before enabling it fleet-wide, and consider Container Insights with enhanced observability only where you need the deeper data.

How to fix it

There are three places to act: an existing cluster, the account-level default for new clusters, and your infrastructure as code. Fix all three so the gap does not reopen.

1. Enable on an existing cluster (CLI)

This is the fastest remediation. It applies to the named cluster only and takes effect on the next task launch.

aws ecs update-cluster-settings \
  --cluster my-production-cluster \
  --settings name=containerInsights,value=enabled \
  --region us-east-1

Confirm it stuck:

aws ecs describe-clusters \
  --clusters my-production-cluster \
  --include SETTINGS \
  --query "clusters[0].settings" \
  --output table

You should see containerInsights with a value of enabled.

Note: Newer AWS regions and CLI versions support an enhanced value (value=enhanced) that adds richer container-level metrics and is the successor to the original Container Insights experience. Use enabled for the classic behavior or enhanced where available and where the extra cost is justified.

2. Set the account-level default

So that every new cluster in a region inherits Container Insights without anyone remembering to flip the switch:

aws ecs put-account-setting-default \
  --name containerInsights \
  --value enabled \
  --region us-east-1

This only affects clusters created after the change. Existing clusters still need the per-cluster update above.

3. Console steps

Open the Amazon ECS console and select Clusters.
Choose the cluster you want to update.
Go to the Update cluster action (or the cluster settings tab).
Under Monitoring, toggle Use Container Insights on.
Save. New tasks launched in the cluster begin reporting metrics.

4. Fix it in Terraform

If your clusters are managed with Terraform, the CLI fix will drift on the next apply. Set it in code instead:

resource "aws_ecs_cluster" "main" {
  name = "my-production-cluster"

  setting {
    name  = "containerInsights"
    value = "enabled"
  }
}

5. Fix it in CloudFormation

{
  "Resources": {
    "ProductionCluster": {
      "Type": "AWS::ECS::Cluster",
      "Properties": {
        "ClusterName": "my-production-cluster",
        "ClusterSettings": [
          {
            "Name": "containerInsights",
            "Value": "enabled"
          }
        ]
      }
    }
  }
}

Tip: Enabling Container Insights does not retroactively populate metrics. Data starts flowing from the next task placement. If you need clean numbers immediately for an investigation, force a new deployment of the affected service so fresh tasks start reporting right away.

How to prevent it from happening again

A one-time fix solves today's finding. Preventing the next one means building the setting into how clusters get created.

Set the regional default in every account

Run put-account-setting-default in every region you deploy ECS to, across every account. Bake it into your account bootstrap or landing zone automation so new accounts inherit it from day one.

Gate it in CI with policy-as-code

Use Checkov, tfsec, or OPA Conftest to reject any Terraform plan that creates an ECS cluster without Container Insights. A simple Conftest policy for Terraform plan JSON:

package main

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_ecs_cluster"
  settings := resource.change.after.setting
  not container_insights_enabled(settings)
  msg := sprintf("ECS cluster '%s' must enable containerInsights", [resource.name])
}

container_insights_enabled(settings) {
  s := settings[_]
  s.name == "containerInsights"
  s.value == "enabled"
}

Wire that into your pull request pipeline so the check runs before merge, not after deploy.

Continuous detection with Lensix

Policy gates catch what flows through your IaC pipeline. They do not catch clusters created by hand in the console, by a one-off script, or by a different team. Lensix scans your live accounts continuously and re-raises ecs_nocontainerinsight whenever a non-compliant cluster appears, regardless of how it was created. Pair the preventive gate with continuous detection and the gap stays closed.

Best practices

Enable it everywhere by default, then opt out deliberately. It is easier to disable Container Insights on a handful of cost-sensitive non-prod clusters than to remember to enable it on every prod cluster.
Pair metrics with alarms. Container Insights gives you the data, but data without alarms is a dashboard nobody watches. Create CloudWatch alarms on task-level CPU and memory utilization so degradation pages you before customers notice.
Set log retention. The performance log events Container Insights writes live in a CloudWatch log group. Apply a retention policy so you are not paying to store year-old performance data indefinitely.
Watch the cost in non-prod. Dev and CI clusters that spin up and tear down hundreds of short-lived tasks can generate surprising metric volume. Disable Container Insights there if the observability is not worth the spend.
Use enhanced observability where it earns its keep. For your most critical services, the enhanced mode gives container-level detail that justifies the extra cost. Reserve classic mode or no insights for everything else.

Tip: If you also run EKS, Container Insights works there too, and the same logic applies. Standardizing observability settings across both ECS and EKS keeps your monitoring story consistent and your runbooks short.

Container Insights is one of the lowest-effort, highest-leverage settings in ECS. A single setting flips you from "what is happening inside this cluster?" to a live dashboard of every task. Turn it on, gate it in CI, and let continuous scanning keep it that way.

ECS Container Insights Disabled: Why It Matters and How to Fix It