This check flags Amazon MSK clusters that store broker data without a customer-managed KMS key for at-rest encryption. Encrypted-at-rest data on stale volumes and snapshots is a compliance and breach risk, so set encryptionInfo.encryptionAtRest.dataVolumeKMSKeyId to a CMK you control. Note this can only be set at cluster creation, so a rebuild is required.
Amazon Managed Streaming for Apache Kafka (MSK) is where a lot of sensitive data ends up in transit: payment events, user activity streams, audit logs, IoT telemetry. All of that lands on broker storage volumes before it expires out of the topic retention window. If those volumes are not encrypted with a key you control, you lose a meaningful layer of defense and, often, a compliance checkbox.
The msk_unencrypted check looks at each MSK cluster and verifies that broker storage is encrypted at rest using a customer-managed AWS KMS key (CMK), rather than relying on a default configuration you cannot fully govern.
What this check detects
Lensix inspects the encryptionInfo block on every MSK cluster in your account and flags clusters where broker storage encryption is not backed by a customer-managed KMS key.
Each MSK cluster has an encryption configuration that looks like this:
{
"EncryptionInfo": {
"EncryptionAtRest": {
"DataVolumeKMSKeyId": "arn:aws:kms:us-east-1:111122223333:key/aws/kafka"
},
"EncryptionInTransit": {
"ClientBroker": "TLS",
"InCluster": true
}
}
}
The piece this check cares about is DataVolumeKMSKeyId under EncryptionAtRest. This controls the KMS key used to encrypt the EBS volumes attached to your Kafka brokers, where partition data physically lives.
Note: MSK always encrypts broker storage at rest. If you do not specify a key, AWS uses an AWS-managed KMS key for the service. The distinction this check enforces is between an AWS-managed key (aws/kafka) and a customer-managed key (CMK) that you create, control access to, and can audit and rotate on your own terms.
Why it matters
"It's already encrypted by default" is the usual response here, and it is technically true. But the default AWS-managed key gives you almost none of the control that auditors and incident responders actually need.
You cannot scope access to an AWS-managed key
AWS-managed keys come with a service-linked key policy you cannot edit. You cannot deny specific principals, you cannot require encryption context conditions, and you cannot revoke access in an emergency. With a CMK, the key policy is yours. If a broker role or an account is compromised, revoking key access on a CMK is a fast way to cut off decryption.
Auditing and forensics
CMK usage shows up in CloudTrail as kms:Decrypt and kms:GenerateDataKey calls tied to your key. During an incident you can answer "what decrypted this data and when," which is far harder with a default service key buried in AWS internals.
Compliance frameworks expect customer-managed keys
PCI DSS, HIPAA, SOC 2, and FedRAMP controls increasingly distinguish between provider-managed and customer-managed key material. Many audit templates explicitly ask whether encryption keys are under customer control with documented rotation and access policies. An AWS-managed key fails that question.
Warning: Snapshots, replicas, and decommissioned broker volumes all inherit the cluster's encryption configuration. A cluster that has been running for a year on a default key has a year of stale data you cannot retroactively re-key without rebuilding.
Blast radius of a leaked data volume
Imagine a broker EBS volume gets exposed through a misconfigured backup pipeline or a snapshot shared too broadly. With a CMK whose policy denies the offending principal, that data stays unreadable. With a default key and broad IAM, the data is decryptable by anyone the service trusts.
How to fix it
Here is the hard truth about MSK encryption at rest: the KMS key cannot be changed after the cluster is created. There is no update-encryption API for the data volume key. Remediation means standing up a new cluster with the correct CMK and migrating your workloads to it.
Danger: There is no in-place fix. Migrating an MSK cluster means moving producers and consumers to new bootstrap brokers and replicating topic data. Plan this as a controlled migration with a rollback path, not a quick patch. Deleting the old cluster destroys all retained messages still in its topics.
Step 1: Create a customer-managed KMS key
aws kms create-key \
--description "MSK broker storage encryption key" \
--tags TagKey=Purpose,TagValue=msk-encryption \
--query 'KeyMetadata.Arn' \
--output text
Give it a friendly alias so it is easy to reference:
aws kms create-alias \
--alias-name alias/msk-broker-storage \
--target-key-id arn:aws:kms:us-east-1:111122223333:key/abcd1234-...
Step 2: Scope the key policy
Grant the MSK service and your cluster operators only what they need. A minimal key policy statement for the MSK service:
{
"Sid": "AllowMSKUseOfKey",
"Effect": "Allow",
"Principal": { "Service": "kafka.amazonaws.com" },
"Action": [
"kms:Encrypt",
"kms:Decrypt",
"kms:GenerateDataKey",
"kms:GenerateDataKeyWithoutPlaintext",
"kms:DescribeKey",
"kms:CreateGrant"
],
"Resource": "*"
}
Step 3: Create the new cluster with the CMK
Build an encryption config file:
{
"EncryptionAtRest": {
"DataVolumeKMSKeyId": "arn:aws:kms:us-east-1:111122223333:key/abcd1234-..."
},
"EncryptionInTransit": {
"ClientBroker": "TLS",
"InCluster": true
}
}
Then create the cluster, referencing that config:
aws kafka create-cluster \
--cluster-name orders-stream-prod-v2 \
--kafka-version "3.6.0" \
--number-of-broker-nodes 3 \
--broker-node-group-info file://broker-node-group.json \
--encryption-info file://encryption-info.json
Step 4: Migrate traffic
Use MirrorMaker 2 or MSK Replicator to copy topic data and offsets from the old cluster to the new one, then cut producers and consumers over to the new bootstrap brokers. A typical replicator setup:
aws kafka create-replicator \
--replicator-name orders-migration \
--source-kafka-cluster file://source-cluster.json \
--target-kafka-cluster file://target-cluster.json \
--replication-info-list file://replication-info.json \
--service-execution-role-arn arn:aws:iam::111122223333:role/MSKReplicatorRole
Tip: Run both clusters in parallel during the cutover. Point consumers at the new cluster first and confirm they catch up to live offsets before redirecting producers. That ordering minimizes the window where messages could be dropped.
Step 5: Decommission the old cluster
Once the new cluster is serving all traffic and you have confirmed no consumers depend on the old one:
aws kafka delete-cluster \
--cluster-arn arn:aws:kafka:us-east-1:111122223333:cluster/orders-stream-prod/abcd-1234
Fixing it the right way: Terraform
Because remediation requires a rebuild, doing this in infrastructure as code from the start saves you the migration pain entirely. Define the CMK and the encryption block together:
resource "aws_kms_key" "msk" {
description = "MSK broker storage encryption key"
enable_key_rotation = true
deletion_window_in_days = 30
}
resource "aws_msk_cluster" "orders" {
cluster_name = "orders-stream-prod"
kafka_version = "3.6.0"
number_of_broker_nodes = 3
broker_node_group_info {
instance_type = "kafka.m5.large"
client_subnets = var.private_subnet_ids
security_groups = [aws_security_group.msk.id]
storage_info {
ebs_storage_info {
volume_size = 1000
}
}
}
encryption_info {
encryption_at_rest_kms_key_arn = aws_kms_key.msk.arn
encryption_in_transit {
client_broker = "TLS"
in_cluster = true
}
}
}
If encryption_at_rest_kms_key_arn is omitted, Terraform lets MSK fall back to the AWS-managed key, which is exactly what this check flags. Make it a required variable in your module so nobody can skip it.
How to prevent it from happening again
Since you cannot re-key a running cluster, prevention is the whole game. Catch missing CMKs before the cluster ever exists.
Block it in CI with policy as code
An OPA/Conftest rule against your Terraform plan:
package msk.encryption
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_msk_cluster"
not resource.change.after.encryption_info[_].encryption_at_rest_kms_key_arn
msg := sprintf("MSK cluster '%s' must set encryption_at_rest_kms_key_arn to a customer-managed KMS key", [resource.address])
}
Wire it into your pipeline so a plan without a CMK fails the build:
terraform plan -out=tfplan
terraform show -json tfplan > tfplan.json
conftest test tfplan.json --policy ./policies
Enforce with an SCP guardrail
You can use a service control policy to deny MSK cluster creation unless a CMK is specified, although MSK does not expose the key ARN as a clean condition key in all cases. A more reliable approach is an AWS Config rule plus auto-remediation that alerts when a non-compliant cluster appears, paired with a CI gate that prevents the bad code from merging in the first place.
Tip: Run the msk_unencrypted check on a schedule in Lensix and route findings to Slack or a ticket. Because the fix is a migration, you want to know about a non-compliant cluster on day one, not at audit time when you have a year of retained data sitting on the wrong key.
Best practices
- Always specify a CMK at creation. The cost of a dedicated KMS key is trivial next to the cost of a migration. Treat it as non-negotiable for every cluster.
- Enable key rotation. Set
enable_key_rotation = trueso KMS rotates the backing key material annually with no action from you. - Encrypt in transit too. Set
client_broker = "TLS"andin_cluster = true. At-rest encryption protects stored data, but plaintext between clients and brokers is just as exposed. - Use separate keys per environment. A distinct CMK for prod versus staging means a key policy or revocation in one environment never touches another.
- Tighten the key policy. Grant only the MSK service and the specific roles that administer the cluster. Avoid wildcard principals on the key policy.
- Document key ownership. Auditors will ask who owns the key, how it rotates, and who can use it. Tag your keys and keep that mapping current.
The pain of an MSK migration is real, which is exactly why this is worth getting right the first time. A five-minute decision at cluster creation saves a weekend of MirrorMaker babysitting later.
If you are running Lensix, the msk_unencrypted check gives you the inventory of clusters that need attention. Sort them by data sensitivity, schedule the migrations, and lock the new clusters down with a CMK and a CI gate so the finding never comes back.

