Back to blog
AWSBest PracticesCloud SecurityOperations & ComplianceStorage

EMR Local Disk Encryption Disabled: Why It Matters and How to Fix It

Learn why EMR clusters need local disk encryption, the risks of leaving it off, and step-by-step CLI, console, and Terraform fixes plus CI/CD prevention.

TL;DR

This check flags EMR clusters that lack local disk encryption in their security configuration, leaving the data EMR writes to local instance storage exposed at rest. Fix it by creating a security configuration with at-rest local disk encryption enabled and launching clusters with it.

Amazon EMR clusters do a lot of work on local disk. When Spark spills a shuffle to disk, when Hadoop writes intermediate map output, or when YARN caches application data, that data lands on EBS volumes and instance store attached to the cluster nodes. If you have not configured local disk encryption, all of that data sits unencrypted on disk. The EMR Local Disk Encryption Disabled check catches exactly this gap.

It is an easy thing to overlook because EMR security configurations are optional, and a cluster will happily launch and run without one. That convenience is the trap.


What this check detects

Lensix flags any EMR cluster whose attached security configuration does not enable local disk encryption, or that has no security configuration at all. Specifically, it looks at the EncryptionConfiguration block of the security configuration and checks whether EnableAtRestEncryption is set and a LocalDiskEncryptionConfiguration is present.

EMR encryption splits into two broad categories:

  • At-rest encryption covers data stored on EMRFS (S3), local disks (EBS volumes and instance store), and the storage attached to each node.
  • In-transit encryption covers data moving between nodes.

This check is concerned with the local disk portion of at-rest encryption, which protects the temporary and intermediate data EMR writes to each instance.

Note: Local disk encryption in EMR uses LUKS to encrypt the EBS root and storage volumes, and it can layer on Hadoop Distributed File System (HDFS) encryption with open-source HDFS encryption. The keys come from either an AWS KMS key or a custom key provider you supply.


Why it matters

People tend to assume EMR is "just compute" and that the real data lives safely in S3. But a running cluster materializes a surprising amount of sensitive data on its local disks:

  • Shuffle and spill files from Spark and MapReduce jobs, which can contain full copies of the rows being processed.
  • HDFS blocks if you are using HDFS as a working store.
  • Cached datasets, temporary tables, and scratch space from Hive, Presto, or Trino.
  • Application logs that may include record-level detail.

If those volumes are not encrypted, several realistic problems open up.

Attack and exposure scenarios

EBS snapshot exposure. A common mistake is leaving EBS snapshots or AMIs public or shared too broadly. An unencrypted volume snapshotted and shared hands the contents straight to whoever can read it. An encrypted volume produces an encrypted snapshot, so the data stays protected even if the snapshot leaks.

Physical and hypervisor layer. Encryption at rest is part of meeting the assumption that nobody should be able to read your data off the underlying storage media. Without it, you are relying entirely on AWS physical controls with no defense in depth of your own.

Compliance failures. PCI DSS, HIPAA, SOC 2, and most internal data-handling standards require encryption of data at rest. An auditor who finds a cluster processing cardholder or health data on unencrypted local disks will write that up, and rightly so.

Warning: You cannot retrofit encryption onto a running EMR cluster. Local disk encryption is set at cluster creation through the security configuration. Remediating an existing cluster means launching a new one, so plan the cutover rather than expecting an in-place toggle.


How to fix it

The fix has two parts: create a security configuration that enables local disk encryption, then launch your clusters with it.

Step 1: Create or pick a KMS key

You need a KMS key that the EMR service role and EC2 instance profile can use. If you do not already have one:

aws kms create-key \
  --description "EMR local disk encryption" \
  --query 'KeyMetadata.KeyId' \
  --output text

Note the returned key ID or ARN for the next step. Make sure the key policy grants the EMR roles permission to use it for encrypt, decrypt, and grant operations.

Step 2: Define the security configuration

Create a JSON document that turns on at-rest encryption with local disk encryption backed by your KMS key:

{
  "EncryptionConfiguration": {
    "EnableInTransitEncryption": false,
    "EnableAtRestEncryption": true,
    "AtRestEncryptionConfiguration": {
      "LocalDiskEncryptionConfiguration": {
        "EncryptionKeyProviderType": "AwsKms",
        "AwsKmsKey": "arn:aws:kms:us-east-1:111122223333:key/your-key-id",
        "EnableEbsEncryption": true
      }
    }
  }
}

Save it as emr-security-config.json and register it with EMR:

aws emr create-security-configuration \
  --name "local-disk-encryption" \
  --security-configuration file://emr-security-config.json

Tip: Set EnableEbsEncryption to true so the EBS volumes themselves are encrypted in addition to the LUKS-based local disk encryption. This covers both the storage volumes and any snapshots derived from them.

Step 3: Launch the cluster with the security configuration

Reference the security configuration by name when you create the cluster:

aws emr create-cluster \
  --name "analytics-cluster" \
  --release-label emr-7.1.0 \
  --applications Name=Spark Name=Hive \
  --instance-type m5.xlarge \
  --instance-count 3 \
  --security-configuration "local-disk-encryption" \
  --use-default-roles \
  --ec2-attributes KeyName=my-key

Console steps

  1. Open the EMR console and go to Security configurations.
  2. Choose Create, give it a name, and check Enable at-rest encryption.
  3. Under Local disk encryption, pick AWS KMS as the key provider and select your key.
  4. Enable EBS encryption, then save.
  5. When creating a cluster, expand the security options and select this configuration.

Migrating an existing cluster

Because encryption is fixed at launch, you replace the cluster rather than modify it:

  1. Create the security configuration as above.
  2. Spin up a new cluster with the same applications, instance setup, and bootstrap actions, now referencing the configuration.
  3. Repoint your job submissions, schedulers, or step definitions at the new cluster ID.
  4. Drain and terminate the old cluster once nothing is running on it.

Danger: Terminating an EMR cluster permanently destroys its local HDFS and any data not persisted to S3. Confirm that all output is written to EMRFS or another durable store before you run the command below.

aws emr terminate-clusters --cluster-ids j-OLDCLUSTERID

How to prevent it from happening again

Manual remediation is fine once. The real win is making it impossible to launch a non-compliant cluster.

Bake it into your IaC

If you define EMR clusters with Terraform, make the security configuration a required input and reference it in every cluster resource:

resource "aws_emr_security_configuration" "encrypted" {
  name = "local-disk-encryption"

  configuration = jsonencode({
    EncryptionConfiguration = {
      EnableInTransitEncryption = false
      EnableAtRestEncryption    = true
      AtRestEncryptionConfiguration = {
        LocalDiskEncryptionConfiguration = {
          EncryptionKeyProviderType = "AwsKms"
          AwsKmsKey                 = aws_kms_key.emr.arn
          EnableEbsEncryption       = true
        }
      }
    }
  })
}

resource "aws_emr_cluster" "analytics" {
  name                   = "analytics-cluster"
  release_label          = "emr-7.1.0"
  applications           = ["Spark", "Hive"]
  security_configuration = aws_emr_security_configuration.encrypted.name
  service_role           = aws_iam_role.emr_service.arn
  # ... instance groups, ec2 attributes, etc.
}

Gate it in CI/CD with policy-as-code

Add a policy check to your pipeline so a plan that creates an EMR cluster without a security configuration fails before it reaches AWS. Here is an example using OPA/Conftest against a Terraform plan:

package emr

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_emr_cluster"
  not resource.change.after.security_configuration
  msg := sprintf("EMR cluster '%s' has no security configuration", [resource.address])
}

Tip: Run Lensix continuously against your accounts so that even clusters created outside your IaC pipeline, by a console click or a one-off script, get flagged. CI gates only catch what flows through CI.

Use SCPs for a hard backstop

A Service Control Policy can deny elasticmapreduce:RunJobFlow calls that omit a security configuration, blocking the action at the organization level regardless of how it is invoked.


Best practices

  • Turn on the full encryption suite, not just local disk. If your cluster reads from or writes to S3, enable EMRFS encryption too, and consider in-transit encryption for inter-node traffic when handling sensitive data.
  • Standardize on one or two security configurations. Maintaining a small, well-reviewed set is easier than per-team configs that drift. Reference them everywhere.
  • Scope your KMS key policy tightly. Grant only the EMR service role and instance profile access, and use a dedicated key for EMR so you can audit and rotate it independently.
  • Persist results to S3, keep HDFS ephemeral. Treating local storage as scratch space limits how much sensitive data lives there and makes cluster replacement painless.
  • Audit existing clusters regularly. Long-running EMR clusters are the ones most likely to predate your encryption standards. Check them on a schedule rather than assuming new clusters cover you.

Local disk encryption costs you nothing in performance terms worth worrying about and very little effort to set up once. Compared to explaining to an auditor why a cluster was processing regulated data on unencrypted volumes, it is one of the cheaper controls you will ever turn on.