All Resources
In this article:
minus iconplus icon
Share the Blog

Cato XOps + Sentra: Turning Data Intelligence into Action

May 6, 2026
4
Min Read
Data Security

Every security team knows the feeling. You finally get a clear picture of where your sensitive data lives and how exposed it is, then you have to swivel your chair into a completely different system to do anything about it.


On one side, you have Sentra, an AI Data Readiness platform that continuously discovers, classifies, and governs sensitive data across your entire cloud and SaaS estate. In the AI era, that scope is more consequential than ever: every Copilot license, every deployed agent, and every model pipeline inherits the access of the identity it operates under. An overpermissioned file share or a stale sensitive dataset is no longer a future risk. It is an AI response surfacing the wrong content to the wrong person, today. Sentra’s in-environment architecture means discovery and classification happen inside your own cloud account, with sensitive data never leaving your control, giving security teams the continuous, accurate signal they need to govern what AI can actually reach. On the other side, you have Cato Networks and the Cato SASE Cloud - where you see users, devices, applications, AI agents, and traffic in real time; and where you can enforce the controls that determine what actually reaches your most sensitive data.


The Cato XOps and Sentra integration closes that gap. It is the missing link between AI data governance and network-layer enforcement: the data risks Sentra surfaces; overpermissioned stores, unclassified sensitive files, identities with excessive access to AI-reachable data, can now be understood, investigated, and acted on directly inside Cato XOps, without leaving the SASE console. For Cato customers, this means the question “what data is at risk if this user or agent is compromised?” has an immediate answer, right where the investigation is already happening.

 

Two views of the same problem

Imagine you’re a security architect responsible for data protection in a hybrid enterprise.

Sentra is where you go to answer questions like:

  • Where are our most sensitive data sets actually stored?
  • Which identities, human or machine, can reach them?
  • Where are we over‑exposed because of public links, broad groups, or shadow copies?

Cato XOps is where your operations team lives day to day:

  • They see which users are on the network right now, which applications they’re reaching, and from where.
  • They manage policies and workflows that decide what’s allowed, what’s blocked, and what triggers an investigation.

Both views are essential, but in most organizations they’ve been living parallel lives. A critical finding in Sentra becomes a screenshot in Slack, a ticket in a queue, or a vague request to “tighten things up over here.”

The Cato XOps–Sentra integration is designed to make that handoff automatic and continuous.

 

From data posture to XOps reality

With the integration in place, Sentra doesn’t just store its findings in its own dashboards. When it identifies something important, like a cluster of highly sensitive documents that ended up in a collaboration site with overly broad access, that context is sent into Cato XOps as a first‑class signal.

From the perspective of an analyst sitting in XOps, this is powerful. They no longer see only “a user at branch X talking to application Y.” They can also see that this path touches an environment where Sentra has already mapped significant data risk.

Suddenly, a spike in traffic to a particular SaaS tenant is not just “interesting.” It’s connected to the fact that this tenant stores regulated data, access is too permissive, and that a specific group of users should probably not be anywhere near it.

Instead of juggling spreadsheets and screenshots, SecOps can use the tooling they already know - search, dashboards, incident views in XOps - now enriched with Sentra’s understanding of the data behind the traffic.

 

Making investigations faster and sharper

Consider an investigation that starts on the network side.

Perhaps XOps flags suspicious activity from a user account: unusual login patterns, access from a new location, or an odd mix of applications being used in a short period of time. The natural next question is, “If this account is compromised, what’s really at risk?”

Without integration, answering that question usually means leaving the SASE console and hunting through other systems for clues.

With Sentra feeding context into XOps, the story changes:

  • The investigator pivots into the entity in XOps and immediately sees which data environments Sentra associates with that account.
  • They can see that this user, in addition to everyday SaaS tools, has access to a file share that contains financial records or a project space with customer health information.
  • They can prioritize containment and remediation around the parts of the environment that would actually matter most if the account were abused.

Instead of treating every incident as if it touches all data equally, XOps can help the team aim its time and controls at the users and paths that intersect with real data risk.

 

Turning posture programs into operational change

The integration isn’t just for emergencies. It also helps with the programmatic work of reducing exposure over time.

Most organizations today run ongoing efforts to shrink their attack surface:

  • Reining in org‑wide or public links in collaboration tools.
  • Cleaning up access that accumulates over years of team reshuffles and project work.
  • Bringing sensitive workloads under stricter governance.

Sentra is very good at discovering where these problems live: which stores are over‑exposed, which data classes are in places they shouldn’t be, which identities have surprisingly broad reach.

Cato XOps is very good at turning intent into structured work:

  • Opening the right tickets for the right teams.
  • Tracking those issues through to closure.
  • Providing dashboards that show how exposure is changing over time.

When Sentra’s findings arrive in XOps as events, those two strengths combine. A newly detected over‑exposed data set can automatically become:

  • A work item for the team that owns the underlying application.
  • An object that can be watched more closely from a network and user‑behavior perspective.
  • A data point in the story you tell leadership about how your risk posture is improving month over month.

The result is that Sentra findings stop being an abstract list in a separate console and start living inside the same operational fabric that already runs your SASE and security workflows.

 

A shared language for data‑aware operations

Perhaps the most subtle, but important, outcome of the Cato XOps and Sentra integration is cultural.

Data security people and network/SASE people have historically looked at the world through different lenses:

  • One side talks about data classes, residency, regulated fields, and classification.
  • The other talks about tunnels, sessions, users, identities, and application flows.

By bringing Sentra’s Data Security Platform signals directly into Cato XOps, both groups start to work from a shared set of facts. A Cato analyst can see that an event isn’t just “traffic to a collaboration app,” it’s traffic that intersects a repository where Sentra has identified highly sensitive, regulated information. A data security architect can see that a scary‑looking exposure in a report is tied to only a handful of users and paths, not the entire enterprise.

Over time, that shared context helps teams move from reactive firefighting to data‑aware security operations: the places where your most important information lives and the ways people reach it are understood together, not separately.

 

How to learn more

The integration is documented in Cato’s support portal, including prerequisites and configuration steps: Sentra – Configuring the XOps Integration


For joint customers, enabling it is a way to make both investments - Cato’s XOps and Sentra’s AI Data Readiness platform - more valuable than the sum of their parts. You keep the tools and workflows your teams already rely on, but you give them something they haven’t had before: a continuous feedback loop between where sensitive data actually lives and how people and applications reach it every day.


In a world where AI, SaaS, and hybrid architectures are multiplying the number of places data can go, that loop may be the difference between simply knowing you have a problem and being able to do something about it quickly, precisely, and at scale.

David Stuart is Senior Director of Product Marketing for Sentra, a leading cloud-native data security platform provider, where he is responsible for product and launch planning, content creation, and analyst relations. Dave is a 20+ year security industry veteran having held product and marketing management positions at industry luminary companies such as Symantec, Sourcefire, Cisco, Tenable, and ZeroFox. Dave holds a BSEE/CS from University of Illinois, and an MBA from Northwestern Kellogg Graduate School of Management.

Subscribe

Latest Blog Posts

Ron Reiter
Ron Reiter
May 8, 2026
3
Min Read
Data Security

Mythos Is Already Here. The Question Is What Attackers Will Find.

Mythos Is Already Here. The Question Is What Attackers Will Find.

I've spent a lot of time thinking about what Mythos actually changes — and what it doesn't.

The vulnerabilities Mythos found are not new in nature. They're variations of known vulnerability classes — buffer overflows, race conditions, memory corruption. These aren't novel attack categories. What's new is the speed and scale at which Mythos surfaces them. In pre-release testing, Mythos Preview autonomously developed working exploits for Mozilla Firefox vulnerabilities 181 times, compared to the prior model's two successful attempts out of several hundred. That isn't an incremental improvement. It's a different class of capability.

Modeled scenarios show attackers discovering the majority of new vulnerabilities within a few years, meaning defenders increasingly respond to issues adversaries may already know about. The core challenge shifts from finding vulnerabilities faster to fixing them faster.

That's the real strategic shift. And for data security specifically, it has a specific implication that I think is underappreciated in the current conversation.

PATCH SPEED IS NECESSARY. IT'S NOT SUFFICIENT.

When a Mythos-class tool helps an attacker gain initial access — through a zero-day in a browser, an OS, an unpatched server — the next thing that determines outcome is what they find. What data is accessible from the compromised position. What identities and service accounts can be traversed. What sensitive records sit in environments with overly broad permissions.

Most security conversations right now are about accelerating patch cycles, which is the right instinct. A 2025 report found that over 45% of discovered security vulnerabilities in large organizations remain unpatched after 12 months. Closing that gap matters enormously. But patching controls the entry point. It doesn't control the blast radius once someone is in.

The blast radius question is a data question.

WHAT ATTACKERS FIND WHEN THEY GET IN

The uncomfortable truth is that most organizations don't have a comprehensive, current answer to: what sensitive data is accessible from any given position in my environment?

Data accumulates in ways that security teams don't fully track. Salesforce orgs fill up with PII from integrations that nobody audited. Lakehouses absorb years of production data pipelines. Cloud storage buckets get misconfigured and forgotten. Service accounts accumulate permissions that outlive the workflows they were created for. And increasingly, AI agents and copilots run under those service accounts — meaning whatever a service account can reach, the AI can retrieve and synthesize.

This isn't a hypothetical. It's the operational reality in most enterprises I talk to.

When Mythos-class capabilities become more widely available — and Anthropic's own estimate is that similar capabilities will proliferate from other AI labs within six to eighteen months — the attack surface question becomes: not just "what vulnerabilities can be exploited" but "what data becomes accessible when they are." Those are different problems with different solutions.

WHAT ACTUALLY CHANGES YOUR RISK PROFILE

Assume breach. Not as a thought experiment, but as the operating reality it now is.

Given that, the most meaningful thing you can do in the next 90 days isn't buy another scanner. It's get a clear, continuous answer to where your sensitive data actually lives — across cloud, SaaS, data warehouses, and the AI systems layered on top of them — and make sure the access picture reflects least privilege, not accumulated permissions from three years of workflow changes.

That means:

Knowing what's in your environment, continuously. Not a quarterly scan. Not a point-in-time audit. When Mythos-class tools can find and exploit a vulnerability overnight, a quarterly data inventory is operationally useless. You need to know what sensitive data exists and where it lives as a continuous fact, not a periodic report.

Understanding what each identity can reach. The blast radius of any successful exploit is bounded by what the compromised identity — human or service account or AI agent — can access. If that access picture isn't mapped to sensitive data at the record level, you can't assess exposure or contain it quickly after a breach.

Eliminating data that shouldn't be where it is. The most effective way to reduce Mythos-era blast radius is to not have sensitive data sitting in places it doesn't need to be. Redundant copies of regulated records, production data that migrated to dev environments, PII sitting in SaaS tools it arrived in through integration workflows — this is the data that causes the notifications, the regulatory exposure, and the headlines. Getting rid of it before an attacker finds it is categorically better than discovering it during incident response.

THE PART OF THIS CONVERSATION THAT ISN'T GETTING ENOUGH ATTENTION

Most of the Mythos coverage has focused, reasonably, on the vulnerability discovery side. That's where the dramatic capability jump is visible. But the quieter implication is about what happens after discovery and exploitation — which is where data security actually determines outcome.

"The window between a vulnerability being discovered and being exploited by an adversary has collapsed — what once took months now happens in minutes with AI," according to one Project Glasswing partner. If that compression applies equally to time-to-exploit, it applies equally to time-to-data. The faster an attacker can reach a compromised system, the faster they reach whatever's accessible from it.

This is the Mythos implication for data security teams: the window for containment is shrinking, and continuous data visibility is how you make that window matter.

---

FREQUENTLY ASKED QUESTIONS

What is Claude Mythos Preview?

Claude Mythos Preview is an AI model announced by Anthropic in April 2026 capable of autonomously discovering and exploiting zero-day vulnerabilities across every major operating system and browser, at a speed and scale that significantly exceeds human security researchers.

Is Mythos publicly available?

Anthropic has withheld general release, citing offensive risk. Access has been granted to approximately 40 organizations through Project Glasswing, a defensive security consortium. Anthropic estimates comparable capabilities will emerge from other AI labs within 6 to 18 months.

What does "assume breach" mean in a Mythos context?

Assume breach means designing your security posture around the expectation that attackers will get in — focusing less on prevention at the perimeter and more on limiting what they find inside. In a Mythos context, where exploit development can happen overnight, assume breach shifts from a framework to an operating reality.

How does data visibility reduce breach blast radius?

Blast radius — the scope of damage from a successful breach — is determined by what sensitive data is accessible from a compromised position, not by the exploit itself. Organizations with continuous, comprehensive data classification and least-privilege access governance can identify what was exposed quickly and contain the damage. Organizations without it typically discover their exposure during incident response, when it's too late.

What is DSPM and how does it help with Mythos preparedness?

Data Security Posture Management (DSPM) is a continuous monitoring discipline that discovers and classifies sensitive data across cloud, SaaS, and on-premises environments, maps access to that data, and identifies where sensitive records are exposed to over-permissioned identities or misconfigured controls. In a Mythos-era threat model, DSPM provides the continuous data inventory that makes blast radius assessment and containment possible.

Read More
Yair Cohen
Yair Cohen
May 7, 2026
3
Min Read
Data Security

The Instructure Breach Was Salesforce. Again. Here's the Governance Problem Nobody Is Talking About.

The Instructure Breach Was Salesforce. Again. Here's the Governance Problem Nobody Is Talking About.

ShinyHunters breached Instructure - the company behind Canvas LMS - and claimed 275 million student and teacher records, 3.65 terabytes of data, and a ransom deadline of May 6, 2026. That alone is a significant breach. But the detail buried in the coverage is the more important story for every security team reading this.

This is the second time ShinyHunters has breached Instructure's Salesforce environment. In September 2025, the same group used social engineering to access Instructure's Salesforce instance. Instructure disclosed it, rotated credentials, and continued operating. Eight months later, the same attack surface was breached again.

That is not a story about ShinyHunters' sophistication. It is a story about incomplete remediation and about what happens when a breach response focuses on the credential and the vulnerability without addressing the underlying data exposure.

WHAT SALESFORCE ACTUALLY CONTAINS AND WHY SECURITY TEAMS MISS IT

Most organizations think of Salesforce as a CRM. Their security teams govern it like one - access controls at the application layer, SSO, maybe some DLP on outbound data. What they often don't account for is what accumulates inside Salesforce over years of integrations, workflow automations, and cross-platform data flows.

In the Instructure case, ShinyHunters claims the Salesforce instance contained student and teacher PII, private messages, and institutional records across nearly 9,000 schools. Some of that data flowed into Salesforce deliberately - CRM records, institutional contacts, support tickets. Some of it flowed in through integrations with Canvas that nobody fully audited. All of it was sitting in an environment that, based on the breach timeline, had its access controls reset after September 2025 but was not fundamentally rearchitected.

According to Security Magazine, ShinyHunters has used Salesforce misconfiguration as a repeating attack vector across multiple recent victims - the same playbook behind breaches at McGraw-Hill, Infinite Campus, Amtrak, and ADT. The vector is documented. The pattern is public. And yet organizations continue to treat Salesforce breach response as a credential rotation exercise rather than a data governance exercise.

WHAT "REMEDIATING" A SALESFORCE BREACH ACTUALLY REQUIRES

When a Salesforce environment is breached, the immediate response - revoke credentials, rotate API keys, patch the vulnerability - is necessary. It is not sufficient.

The harder question is: what data was in that Salesforce instance, who could access it, and should it have been there at all? Answering those questions requires classification. Without knowing what sensitive data exists in Salesforce, at the field and record level, there is no way to assess true exposure, implement meaningful least-privilege access, or identify which data flows need to be redesigned.

In Instructure's case, the breach response after September 2025 apparently did not include that step. The data, student PII, private messages, institutional records, remained in the environment, remained broadly accessible, and remained available to ShinyHunters when they returned.

This is the governance gap that keeps breached-and-remediated organizations on the repeat victim list.

THE IDENTITY AND ACCESS DIMENSION

ShinyHunters has also been linked to recent breaches at the University of Pennsylvania, Princeton, and Harvard - all of which share a pattern: large Salesforce deployments, institutional data accumulated over years, access controls managed at the application layer without deep visibility into what sensitive data each identity can actually reach at the data layer.

Sentra's approach to Salesforce governance maps exactly this. Classification runs continuously inside the Salesforce environment - identifying student PII, FERPA-regulated records, private communications, and institutional data that has accumulated through integrations. Access mapping connects each user, service account, and API integration to the sensitive data it can reach - not just the objects it has permissions to access, but the classified sensitive records within those objects. When an integration adds new data flows or permissions drift, the inventory updates in real time.

The output is a continuous answer to the question Instructure's security team could not have answered quickly enough in September 2025: what sensitive data is in Salesforce, what can each identity reach, and what needs to be removed or restricted before the next attempt.

WHAT TO CHECK IN YOUR OWN SALESFORCE ENVIRONMENT THIS WEEK

Three questions worth answering now, regardless of your industry:

First, what sensitive data has accumulated in your Salesforce org through integrations, workflow automations, and cross-platform data flows - beyond what was deliberately put there? Student records, healthcare data, financial records, and private communications all end up in Salesforce through integration patterns that were never evaluated for data sensitivity.

Second, what can each service account and API integration actually reach at the record level? Application-layer access controls in Salesforce do not prevent exfiltration by an attacker who has compromised a sufficiently-permissioned service account. Least-privilege at the data layer requires knowing what sensitive data each identity can access.

Third, if your Salesforce environment were breached today and you had to disclose within 72 hours, could you accurately characterize what data was exposed? FERPA, HIPAA, GDPR, and state-level privacy laws all require specific disclosure of data types. Without continuous classification, the answer in most environments is: not quickly, and not accurately.

FREQUENTLY ASKED QUESTIONS: SALESFORCE DATA SECURITY

What is the ShinyHunters Salesforce attack pattern?

ShinyHunters has repeatedly used Salesforce as an attack vector - typically gaining initial access through social engineering or credential theft, then exfiltrating data from the Salesforce org and using it for extortion. The pattern has appeared in breaches at Instructure (twice), McGraw-Hill, Infinite Campus, Amtrak, and others. The common thread is that Salesforce environments contain far more sensitive data than most security teams have classified or actively governed.

What data typically accumulates in enterprise Salesforce environments beyond CRM records?

In production Salesforce environments, continuous classification commonly surfaces PII from support ticket integrations, regulated financial or health data from cross-platform workflows, private communications stored in custom objects, API credentials and tokens in log fields, and institutional data from education or healthcare integrations. Most of this data arrives through legitimate integration patterns rather than misconfiguration.

How does DSPM apply to Salesforce environments?

Data Security Posture Management applied to Salesforce continuously classifies sensitive data at the field and record level within the Salesforce org; identifying regulated data types, mapping which identities can access them, and flagging access that exceeds least-privilege requirements. This runs inside the customer's environment without data leaving the Salesforce perimeter.

What is the difference between Salesforce's native security tools and DSPM?

Salesforce's native tools - Shield, field-level security, permission sets - control access at the object and field level. They do not classify data by sensitivity, identify regulated records that should not be in a given field, or map the sensitive data reachable by each integration or service account. DSPM fills that gap: it understands what the data is, not just who has permission to access it.

What does FERPA require in the event of an educational data breach?

FERPA requires institutions to protect the privacy of student education records. In a breach involving student PII, private messages, and institutional records - as in the Instructure case - affected institutions face notification obligations, potential loss of federal funding eligibility, and civil liability. Accurate and timely disclosure requires knowing exactly what records were exposed, which requires prior classification.

The Instructure breach happened twice because the data was never classified after the first incident. Credential rotation without data governance leaves the same exposure in place for the next attempt. Sentra continuously classifies sensitive data inside your Salesforce environment at the field and record level, maps what every identity and integration can reach, and flags access that exceeds least-privilege — so your breach response closes the actual gap, not just the credential.

See how Sentra governs Salesforce data → Schedule a Demo

Read More
David Stuart
David Stuart
May 5, 2026
3
Min Read
AI and ML

Your Data Lakehouse Is Now Your AI Data Plane. Is the Governance Designed In?

Your Data Lakehouse Is Now Your AI Data Plane. Is the Governance Designed In?

Most enterprises are building their AI programs on top of a data foundation that was never designed to be governed at AI speed.

Data lakehouses, including Databricks, Snowflake, and Delta Lake, have become the default substrate for enterprise AI and analytics. They are where training data lives, where RAG pipelines retrieve context, where Copilots and agentic workflows find the information they need to function. That consolidation is genuinely useful. But it creates a governance problem that most security and compliance teams are still working to solve: sensitive data accumulates in lakehouses faster than classification and access controls keep up, and AI systems inherit whatever access their underlying service accounts allow.

What is the data lakehouse AI governance problem?

The data lakehouse AI governance problem is the gap between what AI systems can technically access in a lakehouse environment and what security teams have classified, controlled, and approved for AI use. This gap exists because lakehouses were originally designed for analytics and data science workloads — not as AI retrieval infrastructure. When GenAI copilots and agents were layered on top, they inherited access to years of accumulated data that was never evaluated through an AI risk lens.

In practical terms: a service account that powers a Databricks workflow may have read access to tables containing customer PII, financial records, and HR data accumulated over years of pipeline operations. An AI agent running under that service account can retrieve and surface any of it. The security team may have no visibility into what sensitive data those tables contain, because classification was never applied at lakehouse scale.


Why AI changes the stakes for lakehouse data security

Before generative AI, ungoverned sensitive data in a lakehouse was a slow-moving compliance risk. A human analyst who encountered data they should not have seen was an isolated incident with a clear remediation path. Generative AI changes the failure mode entirely.

When an AI agent traverses a knowledge base or a Copilot retrieves context from a lakehouse, it can synthesize sensitive information across thousands of records in milliseconds and surface it in a response — without any of the friction that would have slowed a human analyst. The blast radius of a single over-permissioned service account is no longer an audit finding. It is a live exposure in the AI systems running in production today.

The scale of this risk is now quantified. According to a March 2026 Check Point Research threat report, 1 in every 28 GenAI prompts poses a high risk of sensitive data leakage, and 91% of organizations using GenAI regularly are affected. A January 2026 Netskope Cloud and Threat Report found that GenAI data violations have more than doubled year-over-year, with personal cloud and unsanctioned GenAI tool usage creating policy violations that most security teams cannot detect without comprehensive cross-platform monitoring.

The AI security layer has to move upstream — to the data itself, before any model touches it.

What data lakehouse AI readiness requires

Data lakehouse AI readiness is the state in which an organization can continuously answer three questions about its lakehouse environment:

  1. What sensitive data exists in this environment, how is it classified, and where does it sit relative to AI system access?
  2. What can each AI system — Copilot, RAG pipeline, agent, fine-tuning job — actually reach, based on the underlying identity's permissions?
  3. When data moves between environments (Snowflake to Databricks, S3 to a Bedrock knowledge base), does security posture, access controls and sensitivity classification travel with it?

Most enterprise environments cannot answer all three today. Not because the data is hidden, but because classification was never applied at lakehouse scale, and access mapping was never built to track AI agent identities alongside human ones.

David Stuart, Sr. Director of Product Marketing at Sentra, describes the pattern consistently across enterprise conversations: "Security and compliance teams often become the bottleneck on large-scale AI programs — not because they want to slow things down, but because they genuinely cannot sign off on a program they cannot see clearly. The lakehouse is full of years of pipeline data. Nobody knows exactly what sensitive data is in there, how it's classified, or what the AI can actually reach."

How enterprises are closing the lakehouse AI governance gap

The organizations deploying AI fastest without creating compliance exposure share a specific set of data governance capabilities applied to their lakehouse environments.

Continuous, in-place classification at petabyte scale. Traditional classification approaches that copy data into a separate security platform break down at lakehouse scale, both economically and architecturally. Effective lakehouse governance requires classification that runs inside the environment where data already lives — scanning without moving data, maintaining classification continuously as new data arrives, and updating the inventory in near real time rather than on a quarterly audit cycle.

Context-aware classification, not pattern matching. Regex-based classification misses too much in the unstructured and semi-structured data that lakehouses accumulate — documents, log files, JSON records with embedded PII, free-text fields alongside structured columns. Classification models that understand the meaning of a document rather than just the patterns within it reach accuracy above 95%, which is the threshold compliance teams need to rely on the output without manual review.

Identity-to-data access mapping for AI agents. Standard access reviews map human identities to resources. AI agents operate under service principals and application identities that may have accumulated permissions across an entire lakehouse without any review. Mapping those identities to the sensitive data they can reach — and flagging where that reach exceeds what an AI use case requires — closes the blast radius gap that traditional IAM reviews miss.

Access that follows data lineage. When data moves from a production lakehouse into a RAG vector store or a fine-tuning dataset, the access controls from the source environment do not automatically transfer. Governance that tracks data lineage across those movements — and reconciles access in the destination environment — prevents classification and access controls from becoming stale the moment data flows to a new system.

Sentra's approach to lakehouse governance applies to all four. Scanning happens in-place within the customer environment — sensitive data never leaves the perimeter during the classification process. An independent third-party audit by Expedia validated Sentra’s classification accuracy above 98%. At scale, Sentra scanned nine petabytes of data in under 72 hours without degrading accuracy. The first scan delivers real findings without configuration or custom rules.

Frequently asked questions: Data lakehouse AI security

What is the difference between lakehouse data security and traditional cloud data security (CNAPP/CSPM)? Traditional cloud data security focuses on storage configurations, bucket policies, and network controls. Lakehouse data security adds classification of the data itself both at the entity and file level — understanding what sensitive records exist within tables and files and the business purpose — and maps that classification to AI system access. This is essential because AI agents bypass application-level controls and retrieve data directly based on underlying identity permissions.

How does DSPM apply to data lakehouse environments? Data Security Posture Management (DSPM) applied to lakehouses continuously discovers and classifies sensitive data within Databricks, Snowflake, Delta Lake, and similar platforms, assesses its risk, then maps identified sensitive data to the identities that can access it. This gives security teams an accurate, current inventory of what AI systems can reach and where access creates risk — replacing periodic manual audits with continuous automated governance.

What sensitive data typically accumulates in enterprise lakehouses? In production lakehouse environments, scans commonly surface customer PII from CRM and event pipelines, financial records from transaction systems, health-adjacent information from HR and benefits platforms, authentication credentials in log files and configuration data, and internal documents ingested from file storage integrations. Most of these categories arrive through legitimate pipeline operations rather than misconfiguration.

How do you govern what AI agents can access in a lakehouse? Governing AI agent access in a lakehouse requires three steps: classifying the sensitive data in the environment so you know what exists, mapping the service accounts and application identities that AI agents run under to the data those identities can reach, and applying least-privilege controls and remediation to remove access that exceeds the AI use case's requirements. Continuous monitoring then flags when access drift occurs — when new data is added or permissions change — rather than waiting for the next periodic review.

What is the cost of continuous lakehouse classification at petabyte scale? In-place, agentless classification at petabyte scale is approximately ten times more cost-efficient than approaches that extract and analyze data in a vendor's external infrastructure.

The question to answer before your next AI deployment

For most organizations, the honest answer to "is our lakehouse AI-ready?" is: not yet. The good news is that this is a solvable problem, and it is being solved in production environments today at petabyte scale.

The organizations moving fastest on AI are not the ones waiting until governance is perfect. They are the ones running continuous classification now so that governance improves in parallel with deployment — and so that when the compliance team asks what the AI can reach, there is a clear, current, accurate answer.

The question is not whether to govern the data underneath your AI program. The question is whether you do it before the first incident or after.

To see how Sentra maps sensitive data in lakehouse environments, schedule a demo.

Read More
Expert Data Security Insights Straight to Your Inbox
What Should I Do Now:
1

Get the latest GigaOm DSPM Radar report - see why Sentra was named a Leader and Fast Mover in data security. Download now and stay ahead on securing sensitive data.

2

Sign up for a demo and learn how Sentra’s data security platform can uncover hidden risks, simplify compliance, and safeguard your sensitive data.

3

Follow us on LinkedIn, X (Twitter), and YouTube for actionable expert insights on how to strengthen your data security, build a successful DSPM program, and more!

Before you go...

Get the Gartner Customers' Choice for DSPM Report

Read why 98% of users recommend Sentra.

White Gartner Peer Insights Customers' Choice 2025 badge with laurel leaves inside a speech bubble.