Fix EMR At-Rest Encryption Disabled on AWS | Lensix

TL;DR

This check flags EMR clusters whose security configuration does not enable at-rest encryption for EMRFS data on S3 and for local disks. Without it, your processed data and intermediate shuffle files sit unencrypted. Fix it by creating an EMR security configuration with local disk and EMRFS encryption enabled, then attaching it to your clusters.

Amazon EMR makes it easy to spin up Hadoop, Spark, Hive, and Presto clusters that crunch through large volumes of data. The catch is that EMR clusters tend to handle exactly the kind of data you do not want leaking: customer records, financial transactions, logs full of identifiers. When at-rest encryption is not enabled, that data sits in plaintext across S3, EBS volumes, and instance store disks.

The emr_unencrypted check looks at the security configuration attached to your EMR clusters and confirms that at-rest encryption is turned on for both EMRFS and local disks. If it is missing, the check fails.

What this check detects

EMR encryption settings live in a reusable object called a security configuration. You create the security configuration once, then reference it when you launch a cluster. The at-rest portion covers two distinct things:

EMRFS encryption — protects data that EMR reads from and writes to Amazon S3 through the EMR File System. This can use SSE-S3, SSE-KMS, or client-side encryption with KMS or a custom key provider.
Local disk encryption — protects data written to the EBS volumes and instance store disks attached to cluster nodes. This includes HDFS data, log files, and the temporary shuffle and spill data that Spark and MapReduce generate during jobs.

This check fails when a cluster either has no security configuration at all, or has one where the at-rest encryption flags are turned off.

Note: EMR encryption is split into at-rest and in-transit. This check is concerned only with at-rest. In-transit encryption (TLS for data moving between nodes) is configured separately in the same security configuration, and it is worth enabling at the same time.

Why it matters

People often assume EMR data is "just in S3" and that S3 encryption covers it. That assumption misses two things.

First, EMR clusters write a large amount of data to local disk during processing. Spark spills partitions to disk when they do not fit in memory. MapReduce writes intermediate output between map and reduce phases. HDFS stores blocks on the cluster's EBS volumes. None of that is in S3, and none of it is encrypted unless you turn on local disk encryption. An attacker who snapshots an EBS volume or recovers an instance store disk can read it directly.

Second, EMRFS encryption is not automatic. Even if your S3 bucket has default encryption, the EMRFS layer can be configured to bypass or override those settings depending on your client-side setup. Relying on bucket defaults alone leaves a gap in your audit trail and removes the per-cluster control that compliance reviewers expect to see.

Warning: Frameworks like PCI DSS, HIPAA, and SOC 2 expect documented at-rest encryption for systems that process regulated data. An EMR cluster running financial or health data without a security configuration is a finding waiting to happen during an audit.

The practical attack scenarios:

A compromised IAM principal with ec2:CreateSnapshot permissions copies an unencrypted EBS volume from a cluster node and exfiltrates the snapshot to another account.
A misconfigured S3 bucket policy exposes EMRFS output that was never encrypted, so there is no second layer of protection.
Log files containing query parameters and sample data are written unencrypted to local disk and later harvested.

How to fix it

The fix is to create a security configuration with at-rest encryption enabled, then launch clusters with it. Security configurations are immutable, so you create a new one rather than editing an existing one.

Step 1: Create or identify a KMS key

SSE-KMS gives you the most control, with key rotation and CloudTrail logging of every decrypt call. Create a key if you do not already have one:

aws kms create-key \
  --description "EMR at-rest encryption key" \
  --tags TagKey=purpose,TagValue=emr-encryption

Note the resulting key ARN. You will also need to grant the EMR service role and EC2 instance profile permission to use the key.

Step 2: Create the security configuration

Save the following as emr-security-config.json. Replace the key ARNs with your own:

{
  "EncryptionConfiguration": {
    "EnableInTransitEncryption": false,
    "EnableAtRestEncryption": true,
    "AtRestEncryptionConfiguration": {
      "S3EncryptionConfiguration": {
        "EncryptionMode": "SSE-KMS",
        "AwsKmsKey": "arn:aws:kms:us-east-1:111122223333:key/your-key-id"
      },
      "LocalDiskEncryptionConfiguration": {
        "EncryptionKeyProviderType": "AwsKms",
        "AwsKmsKey": "arn:aws:kms:us-east-1:111122223333:key/your-key-id"
      }
    }
  }
}

Create it:

aws emr create-security-configuration \
  --name "at-rest-encryption-kms" \
  --security-configuration file://emr-security-config.json

Step 3: Launch clusters with the security configuration

Reference the configuration by name when you create a cluster:

aws emr create-cluster \
  --name "encrypted-analytics" \
  --release-label emr-7.1.0 \
  --applications Name=Spark Name=Hive \
  --security-configuration "at-rest-encryption-kms" \
  --instance-type m5.xlarge \
  --instance-count 3 \
  --use-default-roles

Danger: You cannot apply a security configuration to a running cluster. Existing unencrypted clusters must be replaced. Before terminating a production cluster, confirm that any data on local HDFS has been written out to an encrypted S3 location, otherwise it will be lost when the cluster goes away.

Console steps

Open the EMR console and go to Security configurations in the left navigation.
Choose Create, give it a name, and check Enable at-rest encryption.
Under S3 data encryption, choose SSE-KMS and select your KMS key.
Under local disk encryption, choose AWS KMS and select a key.
Save, then reference this configuration on the security options screen when creating new clusters.

Terraform example

resource "aws_emr_security_configuration" "at_rest" {
  name = "at-rest-encryption-kms"

  configuration = jsonencode({
    EncryptionConfiguration = {
      EnableInTransitEncryption = true
      EnableAtRestEncryption    = true
      AtRestEncryptionConfiguration = {
        S3EncryptionConfiguration = {
          EncryptionMode = "SSE-KMS"
          AwsKmsKey      = aws_kms_key.emr.arn
        }
        LocalDiskEncryptionConfiguration = {
          EncryptionKeyProviderType = "AwsKms"
          AwsKmsKey                 = aws_kms_key.emr.arn
        }
      }
    }
  })
}

resource "aws_emr_cluster" "analytics" {
  name                   = "encrypted-analytics"
  release_label          = "emr-7.1.0"
  applications           = ["Spark", "Hive"]
  security_configuration = aws_emr_security_configuration.at_rest.name
  service_role           = aws_iam_role.emr_service.arn

  master_instance_group {
    instance_type = "m5.xlarge"
  }

  core_instance_group {
    instance_type  = "m5.xlarge"
    instance_count = 2
  }
}

Tip: Define the security configuration once as a shared module and reference it across every team that launches EMR. That way encryption is the default path, and nobody has to remember the JSON each time they create a cluster.

How to prevent it from happening again

Manual fixes do not stick. The reliable approach is to block unencrypted clusters before they launch and to keep scanning for drift.

Service Control Policy

You can deny elasticmapreduce:RunJobFlow calls that do not reference a security configuration. This stops unencrypted clusters at the organization level:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyEmrWithoutSecurityConfig",
      "Effect": "Deny",
      "Action": "elasticmapreduce:RunJobFlow",
      "Resource": "*",
      "Condition": {
        "Null": {
          "elasticmapreduce:RequestTag/securityConfiguration": "true"
        }
      }
    }
  ]
}

Warning: SCPs apply to every account in scope. Test in a sandbox OU first so you do not accidentally break a team that has a legitimate reason to launch clusters before their config is in place.

Policy-as-code in CI/CD

If clusters are provisioned through Terraform, run a policy check on every pull request. With Checkov, the relevant rule catches EMR resources missing a security configuration. Add it as a required step:

checkov -d ./infra --framework terraform \
  --check CKV_AWS_31,CKV_AWS_30

Fail the build if the check does not pass. This moves the conversation about encryption to code review, where it is cheap to fix, instead of to an auditor's report.

Continuous scanning

Configuration drift happens when someone launches a one-off cluster outside the pipeline. Lensix runs the emr_unencrypted check on a schedule across your accounts and surfaces any cluster missing at-rest encryption, so a manual launch does not quietly sit unencrypted for months.

Best practices

Enable in-transit encryption too. While you are in the security configuration, turn on TLS for inter-node traffic. The marginal effort is small and it closes the data-in-motion gap.
Use SSE-KMS over SSE-S3. Customer-managed KMS keys give you rotation, fine-grained key policies, and a CloudTrail record of every decrypt operation. That audit trail is what compliance reviewers actually want to see.
Scope KMS key policies tightly. Grant decrypt only to the EMR service role and EC2 instance profile that need it. A broad key policy undoes the value of encrypting in the first place.
Write output to encrypted S3 locations. EMRFS encryption protects the read and write path, but make sure the destination buckets also have default encryption and block public access enabled.
Standardize cluster creation. Whether through Terraform modules, Service Catalog products, or a wrapper script, give engineers a paved road that already includes the security configuration. Encryption should be the easy choice, not an extra step.
Rotate keys and review usage. Enable automatic rotation on the KMS key and periodically review CloudTrail decrypt events for principals that should not be touching EMR data.

At-rest encryption on EMR is a small configuration change with an outsized payoff. The clusters keep running exactly as before, your local disks and EMRFS data are protected, and you have a clean answer when an auditor asks how the data is secured.

EMR Cluster At-Rest Encryption Disabled: Why It Matters and How to Fix It