How a Consumer App Company Secured Over 130 Petabytes in Weeks
A global Consumer App company manages vast, complex cloud environments spanning multiple continents and hundreds of petabytes of sensitive customer and operational data. But their legacy data classification tools were not designed for the massive scale and speed of their cloud data, especially when it came to identifying sensitive information buried deep in complex file formats like JSON and Parquet.
Faced with multiple, complex compliance requirements and ballooning data security costs, the company turned to Sentra.
By adopting Sentra’s AI-powered Data Security Posture Management (DSPM) platform, they accelerated and scaled their data security strategy, achieving 98% classification accuracy and full visibility across cloud-scale infrastructure, and enabling faster compliance - all while reducing operational overhead and cutting cloud costs.
The Challenge: Massive Data, Complex Formats, and Untenable Costs
The data security team’s existing classification tools were never built for the scale and complexity of a data estate over 130 petabytes. As regulatory requirements increased, and data structures became more nested and dynamic, manual tagging and legacy solutions became expensive, inaccurate, and unsustainable.
The team also faced an immense data security challenge: how to accurately classify sensitive information across an enormous cloud environment, while keeping operational costs in check. Their existing legacy tools lacked the precision and scalability to handle complex, nested file formats like JSON and Parquet, which are common in modern data engineering pipelines. Manual tagging was not only time-consuming but also inaccurate, resulting in low coverage and high compliance risk. With regulatory deadlines rapidly approaching, the security team needed a way to gain complete visibility into sensitive data, improve classification accuracy, and implement a scalable architecture that wouldn’t break the budget.
"Our previous solutions simply couldn't keep pace with the sheer volume and complexity of our cloud data. We needed a robust, cloud-native approach that was both effective and economically sound across our entire digital footprint."
— Deputy CISO
After evaluating multiple vendors, the company selected Sentra for its unique combination of deep technical sophistication and practical efficiency.
What stood out:
AI-Driven Classification at Scale: Sentra’s multi-model architecture, including GLiNER for Named Entity Recognition and embedding-based contextual detection, enabled granular, column-level classification, even inside deeply nested Parquet structures.
Cost-Efficient Ephemeral Scanning: Unlike always-on tools, Sentra’s ephemeral EC2 architecture scales to zero when not scanning. Combined with S3 inventory-based change detection and AI- driven smart sampling, it enables fast classification across hundreds of petabytes, at a fraction of the time and cost, and without impacting performance.
Seamless Terraform Deployment: Rapid deployment via infrastructure-as-code made it easy to scale Sentra across multiple environments while enforcing least-privilege access through dual-role AWS authentication.
Why Sentra: Accuracy and Efficiency at Cloud-Native Scale
"Sentra accurately uncovered mislabeled sensitive customer data, enabling rapid validation and remediation. It is now an indispensable element of our data protection strategy allowing us to stay compliant and keep our data protection promise to millions of customers around the world."
— Deputy CISO
Sentra was deployed and delivering results in the customer’s environment in just 12 days. During the initial proof of concept, the data security team was able to select where they wanted scanning to begin and easily configure the platform, allowing the solution to scan 1 terabyte of high-risk data across complex file formats to achieve over 98% classification accuracy. Sentra’s smart sampling approach prioritized the most sensitive and high-impact datasets, optimizing performance without sacrificing precision. The platform was deployed seamlessly using Terraform, integrating directly into the customer’s existing AWS architecture. A secure two-role access model, one for metadata access and another for scanning, ensured strict least-privilege control throughout the process.
Following the successful POC, the security team decided to continue scaling Sentra’s coverage across their vast data estate to cover hundreds of petabytes. The data security team was able to easily roll out Sentra according to their data priorities and leverage automation to minimize manual effort and dramatically accelerate risk remediation.
More relevant Case Studies
Protect Your Secret Sauce: Safeguard Critical IP in the Cloud
Protect Your Secret Sauce: Safeguard Critical IP in the Cloud
The Risk: Leveraging IP Creates Exposure
For manufacturers, intellectual property is everything. Formulas, patents, designs, and recipes are the secret sauce that fuel competitiveness. This critical data must flow through R&D teams, testing labs, and production lines to keep the business moving and thriving.
But in the cloud, this same accessibility that fuels innovation becomes a liability. Blueprints get duplicated in public OneDrives, recipes are stored in shared folders, and patents are over-permissioned to contractors or partners. A single accidental exposure can mean stolen IP, lost contracts, and potentially catastrophic business, financial, or reputational damage.
Security leaders need an accurate, efficient way to know exactly where intellectual property lives across their entire environment, who has access, and when and where it is copied or moved.
How Sentra Helps Security Teams Protect Critical IP
Sentra is built to transform how enterprises safeguard the data that matters most, at the speed and scale of modern cloud enterprises. The AI-powered platform automatically and continuously discovers, classifies, and protects both proprietary intellectual property and regulated customer data across multi-cloud and on-premises environments.
- Automatically discovers and classifies critical data, finding intellectual property everywhere it lives, including patents, designs, CAD files, formulas, communications, images, audio, and video files.
- Alerts about over-exposed IP to enforce least-privilege access so only the right teams and partners can access sensitive files.
- Automatically apply DLP labels for consistent controls across Microsoft 365 Purview, Google Drive, and AWS resource tagging.
- Continuously monitor in real time when files containing IP are overshared or moved and
automatically detect similar sensitive data. - Securely adopt AI while preventing privacy and compliance violations and sensitive corporate data
leakage. - Reduce risk at scale with agentless scanning that avoids outages, API throttling, or compute spikes.
With Sentra, organizations can embrace cloud and AI with confidence; securing their most valuable IP assets without slowing down innovation or production.
Why Security Teams Choose Sentra to Stop Insider Threats Faster
- Detect and mitigate insider-driven data loss in real-time
- Block risky sharing and apply encryption in SaaS tools like Google Drive and Microsoft 365
- Gain continuous visibility across multi-cloud and SaaS with a cloud-native architecture
- Automate least-privilege access control for unstructured and sensitive data
- Prioritize threats using context-aware insights from identity, behavior, and sensitivity
- Enhance DLP tools like MicrosoftPurview to extend coverage and control
How an Aerospace Firm Secured Proprietary Designs
An aerospace manufacturer used Sentra to discover, classify and remediate exposure risk to proprietary data such as; patents, algorithms, and CAD designs across Microsoft 365 and Google Workspace. Sentra quickly discovered duplicate blueprints in employee OneDrives and flagged overshared design files that could have leaked via collaboration. They also used Sentra to enforce their policy of masking all data stored on Snowflake by accurately identifying data as masked or unmasked. Finally, they created a ticketing workflow to automate and streamline remediation of urgent issues. The company cut exposed IP by over 80% in the first month. Deploying Sentra was simple and the scan quickly found exposed proprietary data, IP, and other critical data that if compromised or exfiltrated could cause catastrophic business, financial, or reputational damage.
How a Mortgage Lender Ensures Sensitive Data Gets Masked and Stays Masked
How a Mortgage Lender Ensures Sensitive Data Gets Masked and Stays Masked
One of the largest U.S. mortgage lenders manages over $350 billion in loans across a complex ecosystem of production and non-production cloud environments. They rely on data-intensive applications to support underwriting, processing, and customer management.
Given the nature of their business, mortgage lenders and financial institutions are subject to stringent and multi-layered data protection and privacy regulations, such as; FTC Safeguards Rule, Gramm-Leach-Bliley Act (GLBA), Consumer Financial Protection Bureau (CFPB), SOX, FFIEC guidelines, and increasingly state-level privacy laws like the California Consumer Privacy Act (CCPA). Compliance requires rigorous control over non-production data environments where customer data often gets replicated for development and testing. Most relevant regulations either require or recommend data masking for sensitive customer data.
The mortgage lender had a legacy DSPM solution that generated large volumes of false positives, and lacked the precision to support automated masking workflows needed to ensure compliance. This created significant manual overhead for the data security team.
The financial institution’s data security and compliance teams turned to Sentra and within weeks, they gained column-level visibility into regulated data, automated classification and masking of workflows, and uncovered hundreds of orphaned data stores that could be deleted to both significantly improve regulatory compliance, reduce storage costs and reduce manual workload for the security team.
The Challenge: Manual Masking and Limited Data Visibility
The mortgage lender uses a data masking tool to mask regulated data in non-production environments. Their previous DSPM solution lacked depth and breadth of classification and created too many false positives, leading to over-masking and a labor intensive manual verification process. This made it very difficult to spot what data needed to be masked. Like all financial institutions, the lender also has many sensitive data classifications unique to its business operations that had to be manually tagged. Together, all these classification limitations made it difficult to create data reports to feed to their data masking tool.
For known and correctly classified sensitive data, their data masking tool was able to transform it into realistic synthetic records. Once the original required data masking was performed, there was no reliable way to confirm whether data remained masked after refreshes, especially since the masked data resembled real data so closely. The mortgage lender needed visibility into where PII/PCI and toxic data combinations lived across non-production environments and accurately classified sensitive data before and after being masked.
“The challenge wasn't just masking data; it was the persistent uncertainty of whether that data stayed masked after system refreshes. We needed a reliable way to verify ongoing compliance at a granular level.”
— Chief Compliance Officer, Leading US Mortgage Lender
Why Sentra: Column-Level Precision, Workflow Automation, and Immediate ROI
After a thorough evaluation of leading DSPM vendors, the mortgage lender chose Sentra due to several key capabilities. Its flexible classifier system, which supports both regex and contextual logic using AI-powered classifiers, made it easier to identify masked and unmasked data accurately. The platform’s policy engine offered automated scanning for missing or reverted markers, helping teams detect issues early. Sentra also seamlessly integrated into existing workflows without requiring invasive changes to systems or processes.
Key Outcomes:
- Fast AI-Driven Column-Level Classification: Sentra’s precise tagging engine classified sensitive data across their entire environment in just six weeks, outperforming other vendor tools by automatically identifying PII/PCI, financial data, and compliance-relevant data types.
- Improved Accuracy: With Sentra the compliance and data security teams are able to create a clear view of all the data that needs to be masked and feed this information into their data masking tool for future masking. Sentra can detect whether a dataset contains markers like "@example.com" emails or specially formatted SSNs.
- Automated Data Masking via Jira: Sentra integrated with their existing data masking tool to mask data and pushed alerts to Jira, enabling end-to-end remediation workflows with executive visibility.
- Granular Visibility: By using data classifications and logical negation (e.g., “does not contain marker”), the compliance team can isolate and track both compliant and non-compliant datasets.
- Policy-based Automation: Sentra’s automatic policies engine is set to run on a regular schedule, identifying data assets without expected markers, allowing the compliance and data security teams to take action before audits or incidents occur.
- Compliance Confidence
Able to ensure compliance with multi-layered data protection and privacy regulations and internal security mandates for precise access and masking.
Implementation: From Manual Compliance Burden to Automated Remediation
The mortgage lender deployed Sentra in under six weeks, scanning thousands of data stores across AWS, Snowflake and other cloud and SaaS environments and applied accurate sensitivity labels. Sentra’s classification output determined user roles based on data sensitivity. The integration with Jira and their data masking tool enabled an automated masking workflow, flagging issues to executives and eliminating manual triage.
Following the initial deployment, the financial institution decided to build on this momentum and extend Sentra’s coverage to Google Workspace.
Real Business Impact: Data Visibility, Accurate Masking, and Compliance Confidence
With Sentra, the data security and compliance teams gained deep visibility into sensitive and regulated data across cloud environments and SaaS applications, transforming how they enforce compliance and scale a proactive, automated data protection strategy.
Mortgage Lender and Sentra: Turning Compliance into a Competitive Advantage
What started as a goal to streamline masking and compliance has become a long-term foundation for cloud data governance. The data security team replaced an underperforming legacy DSPM and gained deep visibility into sensitive and regulated data across cloud environments and SaaS applications, transforming how they enforce compliance and scale a proactive, automated data protection strategy. They also implemented a strategic, automated framework for protecting customer data across every environment and ensuring compliance.
Together, the mortgage lender and Sentra have transformed how the financial institution security team supports excellence in development speed, data protection, and regulatory compliance.
