Back to blog
Best PracticesCloud SecurityDatabasesGCPIdentity & Access

BigQuery Dataset Is Publicly Accessible: Risks and Remediation

Learn how to detect and fix publicly accessible BigQuery datasets in GCP, why allUsers and allAuthenticatedUsers are dangerous, and how to prevent recurrence.

TL;DR

This check flags BigQuery datasets that grant access to allUsers (anyone on the internet) or allAuthenticatedUsers (any Google account holder). That exposes your tables, views, and query data to the public. Fix it by removing those principals from the dataset IAM policy.

BigQuery is where a lot of organizations end up parking their most sensitive data: customer records, financial transactions, product analytics, and the joined tables that combine all of it. Because access control in BigQuery happens at the dataset level by default, a single overly broad grant can quietly open an entire collection of tables to the world. This check catches exactly that.


What this check detects

The bigquery_public check inspects the IAM policy attached to each BigQuery dataset and looks for two specific principals:

  • allUsers — grants access to anyone on the internet, with no authentication required.
  • allAuthenticatedUsers — grants access to any user signed in with a Google account, including accounts outside your organization.

If either of these appears in a dataset's access list with any role (viewer, reader, editor, owner, or a custom role), the check fails. A dataset is considered publicly accessible regardless of which role is attached, because even read access to the data is a disclosure risk.

Note: In BigQuery, datasets are the IAM boundary. Tables and views inherit the dataset's access controls unless you set table-level or row-level policies on top. So a public grant at the dataset level can expose every object inside it at once.


Why it matters

People often assume allAuthenticatedUsers is "safe enough" because it requires a Google login. It is not. Any one of the billions of Google accounts in existence qualifies, including throwaway accounts an attacker creates in seconds. Functionally it is public exposure with a thin authentication layer that filters out nobody.

Here is what goes wrong in the real world:

  • Data exfiltration. An attacker who discovers a public dataset can run SELECT queries against your tables and pull entire datasets out, all from their own GCP project. They do not need any foothold in your environment.
  • Billing abuse. When a dataset is shared with allUsers, query costs are typically billed to the project running the query, but exposing schema and structure makes it trivial for attackers to plan large extraction jobs. In some sharing configurations you also expose authorized views that can leak more than intended.
  • Compliance violations. Public exposure of PII, PHI, or cardholder data is a reportable breach under GDPR, HIPAA, and PCI DSS. The fines and disclosure obligations apply whether or not anyone actually accessed the data.
  • Lateral discovery. Analytics datasets frequently contain internal identifiers, email addresses, IP ranges, and references to other systems. Even a "harmless" event-tracking table can hand an attacker a map of your infrastructure.

Danger: Public BigQuery datasets are routinely found by automated scanners. Treat any dataset with allUsers or allAuthenticatedUsers as already compromised, and review the BigQuery data access audit logs to see if it was queried before you locked it down.


How to fix it

The fix is to remove the public principals from the dataset's access list. You can do this through the console, the bq CLI, the gcloud CLI, or your IaC tool.

Step 1: Confirm which datasets are public

Dump the current access policy for a dataset so you can see exactly what is granted:

bq show --format=prettyjson PROJECT_ID:DATASET_ID

Look in the access array for entries containing allUsers or allAuthenticatedUsers, for example:

{
  "access": [
    {
      "role": "READER",
      "specialGroup": "allAuthenticatedUsers"
    },
    {
      "role": "WRITER",
      "iamMember": "allUsers"
    }
  ]
}

Step 2: Remove the public access

The most reliable way to edit a dataset's access list is to export it, edit the JSON, and reapply it. First export:

bq show --format=prettyjson PROJECT_ID:DATASET_ID > dataset.json

Open dataset.json, delete the access entries that reference allUsers or allAuthenticatedUsers, then apply the cleaned-up version:

bq update --source dataset.json PROJECT_ID:DATASET_ID

Warning: Removing public access will immediately break anything that depended on it, including dashboards, scheduled queries, and external partners reading the data anonymously. Confirm who is legitimately using the dataset before you pull the grant, then re-add those specific identities (service accounts, groups, or users) with least privilege.

Alternative: remove via gcloud IAM

If the public principal was added as an IAM member, you can remove it directly with gcloud:

# Remove allUsers as a data viewer
gcloud projects remove-iam-policy-binding PROJECT_ID \
  --member="allUsers" \
  --role="roles/bigquery.dataViewer"

# Remove allAuthenticatedUsers
gcloud projects remove-iam-policy-binding PROJECT_ID \
  --member="allAuthenticatedUsers" \
  --role="roles/bigquery.dataViewer"

Note that dataset-level access set through the BigQuery API lives in the dataset resource itself, not the project IAM policy, so for those you should use the bq update approach above.

Step 3: Re-grant access correctly

Replace the public grant with named identities. Prefer Google Groups over individual users so access is managed in one place:

bq add-iam-policy-binding \
  --member="group:[email protected]" \
  --role="roles/bigquery.dataViewer" \
  PROJECT_ID:DATASET_ID

Fixing it in Terraform

If your datasets are managed in Terraform, never use allUsers or allAuthenticatedUsers in an access block or a google_bigquery_dataset_iam_member resource. Grant access to specific principals:

resource "google_bigquery_dataset" "analytics" {
  dataset_id = "analytics"
  location   = "US"

  access {
    role          = "READER"
    group_by_email = "[email protected]"
  }

  access {
    role          = "OWNER"
    user_by_email = "data-platform-sa@PROJECT_ID.iam.gserviceaccount.com"
  }
}

After applying, run terraform plan to confirm no drift, then verify with bq show.


How to prevent it from happening again

Manual cleanup is fine for the incident, but you want a guardrail that stops the next public dataset before it ships.

Enforce a Domain Restricted Sharing org policy

The strongest control is the iam.allowedPolicyMemberDomains organization policy constraint. It blocks IAM grants to identities outside the domains you allow, which makes allUsers and allAuthenticatedUsers impossible to add anywhere in your org.

gcloud resource-manager org-policies allow \
  iam.allowedPolicyMemberDomains \
  YOUR_CUSTOMER_ID \
  --organization=ORG_ID

Tip: Domain Restricted Sharing is the single most effective control here because it applies org-wide and cannot be overridden by an individual project owner. Roll it out in dry-run mode first to find legitimate external grants you need to allowlist.

Gate it in CI/CD with policy-as-code

Catch public grants in pull requests before they reach GCP. With OPA/Conftest you can scan Terraform plans for forbidden principals:

package bigquery

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "google_bigquery_dataset_iam_member"
  member := resource.change.after.member
  member == "allUsers"
  msg := sprintf("Dataset IAM grants public access: %s", [resource.address])
}

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "google_bigquery_dataset_iam_member"
  member := resource.change.after.member
  member == "allAuthenticatedUsers"
  msg := sprintf("Dataset IAM grants public access: %s", [resource.address])
}

Run it against a plan in JSON:

terraform plan -out=tfplan.binary
terraform show -json tfplan.binary > tfplan.json
conftest test --policy policy/ tfplan.json

Continuous monitoring

Org policies and CI gates cover the resources you manage in code. For everything else, including changes made by hand in the console, you need continuous scanning. Lensix runs the bigquery_public check across your projects on a schedule and alerts you the moment a dataset goes public, so a missed guardrail does not turn into a long-lived exposure.


Best practices

  • Grant access to groups, not individuals. Manage membership in your IdP and let group lifecycle handle joiners and leavers.
  • Use least-privilege roles. Most consumers only need roles/bigquery.dataViewer. Reserve editor and owner for the small set of identities that actually need them.
  • Prefer authorized views over broad table access. If a team needs a subset of columns or rows, expose an authorized view instead of the underlying table. This keeps the raw data locked down.
  • Use VPC Service Controls for sensitive datasets. A service perimeter prevents data from being queried or copied to projects outside the perimeter, even by valid credentials, which limits exfiltration if an account is compromised.
  • Turn on BigQuery data access audit logs. Knowing who queried what, and when, is essential for incident response and for proving a public dataset was never actually accessed.
  • Review dataset access on a schedule. Permissions drift. Periodically audit every dataset's access list and remove grants that are no longer needed.

Note: If you genuinely need to publish an open dataset (some teams do, for public research or open data), isolate it in a dedicated project with no other resources, store nothing sensitive in it, and exclude it from this check with a documented justification rather than loosening org-wide policy.