Back to blog
AWSBest PracticesCloud SecurityDatabasesReliability

Redshift Snapshot Retention Not Configured: Why Zero-Day Retention Is a Backup Time Bomb

A Redshift cluster with automated snapshot retention set to zero has no backups. Learn why it matters and how to fix it with CLI, Terraform, and policy-as-code.

TL;DR

Setting a Redshift cluster's automated snapshot retention to zero disables automated backups entirely, leaving you with no recovery point if the cluster fails or data is corrupted. Set retention to at least 1 day, and 7 or more for production, with modify-cluster --automated-snapshot-retention-period.

Amazon Redshift can automatically snapshot your cluster on a schedule, giving you a series of point-in-time backups you can restore from. That safety net is controlled by a single number: the automated snapshot retention period. When that value is set to 0, Redshift stops taking automated snapshots completely. This check flags any cluster where that has happened.

It is an easy setting to overlook because the cluster keeps running normally. Nothing breaks, performance is fine, queries return as expected. The gap only becomes visible at the worst possible moment, when you need to restore and discover there is nothing to restore from.


What this check detects

The redshift_nosnapshotretention check inspects every Redshift cluster in your account and reads its AutomatedSnapshotRetentionPeriod property. If that value is 0, the cluster has automated snapshots turned off and the check fails.

You can confirm the current value yourself with the AWS CLI:

aws redshift describe-clusters \
  --query 'Clusters[*].{Cluster:ClusterIdentifier,Retention:AutomatedSnapshotRetentionPeriod}' \
  --output table

A retention period of 0 in the output is the condition this check is looking for. Any value of 1 or higher means automated snapshots are active.

Note: Automated snapshots are different from manual snapshots. Automated snapshots are taken by Redshift on a schedule (roughly every 8 hours or every 5 GB of data change) and are deleted automatically when they age past the retention period. Manual snapshots persist until you delete them. Turning off automated retention does not touch any manual snapshots you have already taken, but it also does nothing to replace them.


Why it matters

Redshift is often the backbone of a company's analytics, reporting, and data warehousing. The data in it is frequently the result of expensive, multi-stage ETL pipelines that took hours or days to build. Losing it is not like losing a stateless web server you can spin back up in minutes.

With retention set to zero, you have no automated recovery point. Here is what that exposes you to:

  • Accidental data deletion. A bad DELETE or TRUNCATE run against the wrong table, or a faulty ETL job that overwrites good data, has no automated snapshot to roll back to.
  • Cluster failure. If the underlying hardware fails in a way that affects your data, automated snapshots are what Redshift uses to recover. Without them, recovery options shrink dramatically.
  • Accidental cluster deletion. When you delete a cluster, you are prompted to take a final snapshot, but automated snapshots provide additional recent recovery points in case that final snapshot was skipped or fails.
  • Ransomware and malicious insiders. An attacker with cluster access can drop tables or corrupt data. Recent automated snapshots give you a clean restore point that predates the compromise.

There is also a compliance angle. Frameworks like SOC 2, PCI DSS, and HIPAA expect documented, tested backup and recovery processes for systems holding sensitive or business-critical data. A warehouse with backups disabled will not pass an audit that looks closely.

Warning: The default automated snapshot retention period for a new Redshift cluster is 1 day. Someone has to actively set it to 0 for this check to fail, which usually means it was disabled deliberately at some point, often to save on snapshot storage costs, and never re-enabled. Treat a zero value as a decision that needs review, not an accident.


How to fix it

The fix is a single attribute change. You set the retention period to a sensible number of days and Redshift resumes taking automated snapshots immediately.

Option 1: AWS CLI

Set retention to 7 days, a reasonable starting point for most production clusters:

aws redshift modify-cluster \
  --cluster-identifier my-redshift-cluster \
  --automated-snapshot-retention-period 7

The change applies without downtime. Verify it took effect:

aws redshift describe-clusters \
  --cluster-identifier my-redshift-cluster \
  --query 'Clusters[0].AutomatedSnapshotRetentionPeriod'

Option 2: AWS Console

  1. Open the Amazon Redshift console and go to Clusters.
  2. Select the affected cluster.
  3. Choose Actions, then Modify.
  4. Under Backup, set Automated snapshot retention period to your chosen number of days (for example, 7).
  5. Choose Modify cluster to save.

Option 3: Terraform

If your clusters are managed in Terraform, set the retention period explicitly so it is enforced on every apply:

resource "aws_redshift_cluster" "main" {
  cluster_identifier                  = "my-redshift-cluster"
  node_type                           = "ra3.xlplus"
  cluster_type                        = "multi-node"
  number_of_nodes                     = 2
  automated_snapshot_retention_period = 7
}

Option 4: CloudFormation

{
  "Type": "AWS::Redshift::Cluster",
  "Properties": {
    "ClusterIdentifier": "my-redshift-cluster",
    "NodeType": "ra3.xlplus",
    "ClusterType": "multi-node",
    "NumberOfNodes": 2,
    "AutomatedSnapshotRetentionPeriod": 7
  }
}

Tip: For an extra layer of protection, configure a cross-region snapshot copy so backups survive a regional outage. Run aws redshift enable-snapshot-copy --cluster-identifier my-redshift-cluster --destination-region us-west-2 --retention-period 7 to replicate automated snapshots to a second region.


How to prevent it from happening again

Fixing one cluster is fine, but the goal is to make a retention of zero impossible to ship in the first place. Push the control as far left as you can.

Catch it in CI/CD with policy-as-code

If you use Terraform, a Checkov policy or an OPA/Conftest rule can block any plan that sets retention to zero. Here is a Conftest rule in Rego:

package main

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_redshift_cluster"
  retention := resource.change.after.automated_snapshot_retention_period
  retention < 1
  msg := sprintf("Redshift cluster '%s' must have automated_snapshot_retention_period >= 1", [resource.address])
}

Wire this into your pipeline so a pull request that disables snapshots fails the build before it ever reaches AWS.

Detect drift with AWS Config

For clusters already running, an AWS Config rule gives you continuous detection regardless of how the change was made. AWS provides the managed rule redshift-backup-enabled, which flags clusters with retention below a threshold you set:

aws configservice put-config-rule --config-rule '{
  "ConfigRuleName": "redshift-backup-enabled",
  "Source": {
    "Owner": "AWS",
    "SourceIdentifier": "REDSHIFT_BACKUP_ENABLED"
  },
  "InputParameters": "{\"minRetentionPeriod\":\"7\"}",
  "Scope": {
    "ComplianceResourceTypes": ["AWS::Redshift::Cluster"]
  }
}'

Keep Lensix scanning continuously

Config rules and CI gates each cover part of the picture. Lensix runs the redshift_nosnapshotretention check across every region and account on a schedule, so a cluster created out-of-band or modified by a console user still gets flagged. Pair continuous scanning with the preventive gates above and you close both the new-resource path and the drift path.


Best practices

  • Match retention to your recovery needs. 7 days is a sensible default. For clusters holding regulated or hard-to-rebuild data, 14 to 35 days gives more room. Redshift supports up to 35 days for automated snapshots.
  • Take manual snapshots before risky operations. Before a major schema migration or bulk load, take a manual snapshot. Manual snapshots ignore the retention window and stick around until you remove them.
  • Replicate cross-region. Automated snapshots live in the same region as the cluster by default. Enable cross-region snapshot copy so a regional incident does not take your backups with it.
  • Test your restores. A backup you have never restored from is a hypothesis, not a recovery plan. Periodically restore a snapshot into a test cluster and confirm the data is intact.
  • Standardize through IaC. Define every cluster in Terraform or CloudFormation with the retention period set explicitly. This removes the chance of someone disabling it through the console and leaving it that way.
  • Watch the cost trade-off honestly. Snapshot storage beyond your free allotment incurs charges, but for almost every production warehouse the cost of storage is trivial compared to the cost of rebuilding lost data. Do not disable backups to save a few dollars.

Danger: Never set retention to 0 as a way to clean up storage. Doing so does not just stop new snapshots, it removes your only automated recovery path. If you genuinely need to reduce snapshot costs, lower the retention period to a smaller positive number rather than zeroing it out.

Backups are one of those controls that feel like overhead right up until the day they are the only thing standing between you and a very long incident. A Redshift cluster with snapshot retention at zero is a cluster running without a net. The fix takes one command and applies with no downtime, so there is no reason to leave it that way.