Fix ElastiCache In-Transit Encryption Disabled (AWS)

TL;DR

This check flags ElastiCache for Redis replication groups that send data unencrypted between clients and nodes. Without TLS, cached session tokens, credentials, and PII travel in plaintext across your VPC. The fix is to enable TransitEncryptionEnabled on the replication group, which on existing clusters requires a one-time migration or recreate.

ElastiCache is where a lot of sensitive data ends up by accident. Teams reach for Redis to cache database query results, store session state, hold rate-limiting counters, and queue jobs. All of that traffic flows between your application servers and the cache nodes, and if in-transit encryption is off, every byte moves in plaintext. This check looks for exactly that gap.

Below is what the check detects, why an unencrypted cache is a bigger deal than people assume, and how to close the gap on both new and existing clusters without taking down production.

What this check detects

The elasticache_notls check inspects each ElastiCache for Redis replication group and reports any group where TransitEncryptionEnabled is set to false (or is unset, which defaults to disabled on older clusters). When that flag is off, connections between Redis clients and the cluster nodes, and between primary and replica nodes, use plain TCP with no TLS.

You can confirm the current state of a replication group directly:

aws elasticache describe-replication-groups \
  --replication-group-id my-redis-group \
  --query 'ReplicationGroups[0].TransitEncryptionEnabled'

A return value of false or null means the check will flag it. A value of true means TLS is enforced for the group.

Note: In-transit encryption is a separate setting from at-rest encryption (AtRestEncryptionEnabled). A cluster can have encrypted storage on disk but still send everything over the wire in plaintext. You need both flags for end-to-end protection.

Why it matters

The common defense is "it's inside my VPC, so it's fine." That assumption falls apart quickly. Network isolation is a control, not a guarantee, and it does nothing once an attacker is already on the inside.

What actually flows through Redis

Most Redis deployments hold more than throwaway cache data. Look at a typical session store and you'll find:

Session identifiers and auth tokens that can be replayed to impersonate users
Cached database rows containing PII, email addresses, and account details
API keys and short-lived credentials used for rate limiting or service-to-service calls
Password reset tokens and one-time codes with real exploitation value

Without TLS, any party that can observe the network path between your app and the cache can read all of it.

Realistic attack paths

A flat VPC with plaintext Redis exposes you to several well-trodden scenarios:

Lateral movement. An attacker who compromises a single EC2 instance or container in the same subnet can sniff cache traffic and harvest session tokens for accounts they never touched directly.
Misconfigured security groups. An overly broad security group rule (for example 0.0.0.0/0 on port 6379 during a debugging session that never got reverted) turns plaintext traffic into a remote read.
Compromised sidecar or agent. Logging agents, service meshes, and monitoring sidecars run with network access. A supply-chain compromise of one of those can passively collect cache traffic.

Warning: Plaintext Redis traffic is also a compliance problem. PCI DSS, HIPAA, and SOC 2 all expect sensitive data to be encrypted in transit. An auditor who finds session tokens or cardholder data flowing unencrypted between application and cache will write it up, regardless of your VPC boundaries.

How to fix it

Enabling in-transit encryption is straightforward for new clusters and a bit more involved for existing ones. The key constraint: historically you could not flip TransitEncryptionEnabled on a running cluster, you had to recreate it. AWS later added online migration for engine versions 7.0 and above, so check your version before choosing a path.

Option 1: New replication group with TLS (CLI)

aws elasticache create-replication-group \
  --replication-group-id my-redis-group \
  --replication-group-description "App session cache" \
  --engine redis \
  --engine-version 7.1 \
  --cache-node-type cache.r7g.large \
  --num-node-groups 1 \
  --replicas-per-node-group 2 \
  --transit-encryption-enabled \
  --transit-encryption-mode required \
  --at-rest-encryption-enabled \
  --auth-token "$(openssl rand -base64 32)" \
  --automatic-failover-enabled

Setting --transit-encryption-mode required rejects any plaintext connection. Pairing TLS with an --auth-token (Redis AUTH) is the recommended combination since AUTH passwords should only travel over an encrypted channel.

Option 2: In-place migration for existing clusters (Redis 7.0+)

If your replication group runs Redis 7.0 or later, you can migrate without recreating. Use preferred mode first so clients can connect over either TLS or plaintext during the rollout, then tighten to required.

Warning: This modification triggers a rolling update across nodes and can cause brief connection resets. Run it during a low-traffic window and make sure your client retries failed connections.

# Step 1: enable TLS in preferred mode (accepts both TLS and plaintext)
aws elasticache modify-replication-group \
  --replication-group-id my-redis-group \
  --transit-encryption-enabled \
  --transit-encryption-mode preferred \
  --apply-immediately

# Step 2: after all clients are updated to use TLS, enforce it
aws elasticache modify-replication-group \
  --replication-group-id my-redis-group \
  --transit-encryption-mode required \
  --apply-immediately

Option 3: Recreate for older engine versions

For clusters below 7.0, in-place changes are not supported. The safe path is to create a new TLS-enabled group, optionally seed it from a backup, cut traffic over, then retire the old one.

Danger: Deleting the old replication group is irreversible and drops all cached data and in-flight backups for that group. Confirm the new TLS cluster is serving production traffic and that you have a final snapshot before you run this.

# Create a final snapshot before deletion
aws elasticache create-snapshot \
  --replication-group-id my-redis-group-old \
  --snapshot-name my-redis-group-old-final

# Only after the new cluster is verified in production
aws elasticache delete-replication-group \
  --replication-group-id my-redis-group-old \
  --final-snapshot-identifier my-redis-group-old-final

Update your client connection

Enabling TLS on the server is only half the work. Clients must connect over TLS or they will fail (in required mode) or silently stay plaintext (in preferred mode). A Python example using redis-py:

import redis

r = redis.Redis(
    host="my-redis-group.xxxxxx.ng.0001.use1.cache.amazonaws.com",
    port=6379,
    ssl=True,
    ssl_cert_reqs="required",
    password="your-auth-token",
)

For redis-cli, add the --tls flag:

redis-cli --tls -h my-redis-group.xxxxxx.ng.0001.use1.cache.amazonaws.com -p 6379 -a "$AUTH_TOKEN" PING

Fixing it in infrastructure as code

If you manage ElastiCache through Terraform, set the encryption flags explicitly so the configuration is the source of truth and drift is visible in plan output.

resource "aws_elasticache_replication_group" "redis" {
  replication_group_id = "my-redis-group"
  description          = "App session cache"
  engine              = "redis"
  engine_version      = "7.1"
  node_type           = "cache.r7g.large"
  num_node_groups     = 1
  replicas_per_node_group = 2

  transit_encryption_enabled = true
  transit_encryption_mode    = "required"
  at_rest_encryption_enabled = true
  auth_token                 = var.redis_auth_token

  automatic_failover_enabled = true
}

For CloudFormation, the equivalent properties live on AWS::ElastiCache::ReplicationGroup:

{
  "Type": "AWS::ElastiCache::ReplicationGroup",
  "Properties": {
    "ReplicationGroupId": "my-redis-group",
    "Engine": "redis",
    "EngineVersion": "7.1",
    "TransitEncryptionEnabled": true,
    "TransitEncryptionMode": "required",
    "AtRestEncryptionEnabled": true
  }
}

Tip: Store the auth_token in AWS Secrets Manager and reference it rather than hardcoding it in your IaC. Keep the value out of state files where you can by using a data source or a CI-injected variable.

How to prevent it from happening again

Catching this once is good. Making it impossible to ship a plaintext cache again is better. Push the control as far left as you can.

Block it in CI with policy-as-code

If you use Terraform, an OPA/Conftest policy can fail the pipeline before anything reaches AWS. This rule rejects any replication group without TLS:

package elasticache

deny[msg] {
  rc := input.resource_changes[_]
  rc.type == "aws_elasticache_replication_group"
  rc.change.after.transit_encryption_enabled != true
  msg := sprintf("ElastiCache group '%s' must enable transit_encryption_enabled", [rc.address])
}

Run it against a plan in your pipeline:

terraform plan -out=tfplan.binary
terraform show -json tfplan.binary > tfplan.json
conftest test tfplan.json --policy policy/

Enforce at the org level with SCPs and Config

Use an AWS Config managed rule (elasticache-repl-grp-encrypted-in-transit) to continuously flag non-compliant groups across every account.
Wire that rule to an SSM Automation or EventBridge alert so a new plaintext cluster triggers a notification within minutes of creation.
Run the Lensix elasticache_notls check on a schedule so the finding shows up in your dashboard regardless of how the cluster was created, including click-ops in the console.

Tip: Bake a compliant ElastiCache module into your internal Terraform registry with TLS, AUTH, and at-rest encryption all defaulted on. Teams that consume the module get a secure cluster by default and have to go out of their way to weaken it.

Best practices

In-transit encryption is one piece of a hardened cache. A few related habits make the whole setup stronger:

Enable at-rest encryption too. Set AtRestEncryptionEnabled alongside transit encryption so data is protected on disk and in backups, not just on the wire.
Use Redis AUTH over TLS. An AUTH token without TLS just sends the password in plaintext. Always pair them. For Redis 6+, consider RBAC with the Redis user/ACL feature for granular access.
Lock down security groups. Allow inbound 6379 only from the specific application security groups that need it. Never use 0.0.0.0/0, even temporarily.
Run a recent engine version. Redis 7.0+ unlocks in-place TLS migration and ongoing security fixes. Staying current also avoids the painful recreate path.
Verify clients enforce certificate validation. Setting ssl=True without ssl_cert_reqs="required" can leave clients open to interception. Validate the cert.
Audit periodically. Encryption settings can drift when clusters are cloned, restored from old snapshots, or spun up by automation that predates your standards. Recurring checks catch the regressions.

The bottom line: treat your cache as a data store that holds sensitive material, because it almost always does. Enabling in-transit encryption is a small change with no meaningful downside, and it removes an entire class of passive interception risk from your environment.

ElastiCache In-Transit Encryption Disabled: Why It Matters and How to Fix It