When security and compliance teams talk about data classification, they speak in the language of regulations and standards. Personal Identifiable Information needs to be protected one way. Health data another way. Employee information in yet another way. And that’s before we get to IP and developer secrets.
And while approaching data security this way is obviously necessary, I want to introduce a different way of thinking about data security. When we look at a data asset, we should be classifying them by asking “Is this asset valuable for the amount of data it contains or the quality of data it contains?”?
Here’s what I mean. If an attacker is able to breach a bank’s records and steal a single customer record and credit card number, it’s an annoyance for the bank, and a major inconvenience for the customer. But the bank will simply resolve the issue directly with the customer. There’s no media attention. And it certainly doesn’t have an impact on the bank's business. On the other hand, if 10,000 customer records and credit card details were leaked… that’s another story.
This is what we can call ‘quantitative data’.
The risk to the company if the data is leaked is due to the type of data, yes, but perhaps more important is the amount of data.
But there’s another type of data that works in exactly the opposite way. This is what we can call ‘qualitative data’. This is data that is so sensitive that even a small amount being leaked would cause major damage. A good example of this type of data is intellectual property or developer secrets. An attacker doesn’t need a huge file to affect the business with a breach. Even a piece of source code or a single certification could be catastrophic.
Most data an organization has can be classified as either ‘qualitative’ or ‘quantitative’. And there’s very little they have in common, both in how they’re used and how they’re secured.
Quantitative data moves. Its schema changes. It evolves, moving in and out of the organization. It’s used in BI tools, added to data pipelines, and extracted via ETLs. It’s also subject to data retention policies and regulations.
Qualitative data is exactly the opposite. It doesn’t move at all. Maybe certain developer secrets have a refresh token, but other than that there’s no real data retention policies. The data does not move through pipelines, it’s not extracted, and it certainly doesn’t leave the organization.
So how does this affect how we approach data security?
When it comes to quantitative data, it’s much easier to find and classify this data - and not just because of the size of the asset. A classification tool uses probabilities to identify data - if it sees that 95% of the data in a column contains email addresses, it will label it as ‘email’. And it might have varying degrees of certainty based on each asset.
Qualitative data classification and security can’t work like this. There’s no such thing as ‘50% of a certificate’. It’s either a cert or not. For this reason, identifying qualitative data is much more of a challenge, but securing it is simpler because it doesn’t need to move.
This method of thinking about data security can be helpful when trying to understand how different security tools find, classify, and protect sensitive data. Tools that rely *only* probabilities will have trouble recognizing when qualitative data moves or is at risk. Similarly, any approach that focuses on keeping qualitative data secure might neglect data in movement. Understanding the differences between qualitative and quantitative data is an easy framework for measuring the effectiveness of your organization’s data protection tools and policies
The competitive framing is hard to avoid. Two frontier AI labs, one month apart, both building systems designed to find and fix vulnerabilities before attackers can exploit them. The Hacker News called it OpenAI taking on Anthropic in the AI cybersecurity race. That framing is accurate but slightly misses the point for security teams evaluating what to do with either of them.
Both tools are solving a real and important problem: the window between a vulnerability being discoverable and being exploited has collapsed. As OpenAI's own announcement noted, AI can now compress hours of security analysis into minutes. The goal is to get defenders to vulnerabilities before attackers do. Daybreak builds editable threat models from actual codebases, validates findings in isolated environments, and proposes patches for human review. That is a genuinely useful capability.
But there are two separate questions in play here, and it's worth being precise about which one Daybreak answers.
THE QUESTION DAYBREAK ANSWERS
Daybreak answers, “What vulnerabilities exist in your code, and how do we fix them faster than an attacker can exploit them?”
That is the right question for a vulnerability management platform. It's the offense-versus-defense race that Mythos dramatized and Daybreak responds to. If you can identify and remediate a vulnerability before an attacker has a working exploit, you've won that exchange.
THE QUESTION DAYBREAK DOESN'T ANSWER
Daybreak doesn't answer, “If a vulnerability is exploited before it's patched, what does the attacker reach?”
This is the blast radius question. And it's the question that determines whether a successful exploit becomes a contained incident or a material breach.
The answer depends entirely on what sensitive data is accessible from the compromised position. What's in the codebase environment, what service accounts have access to, what data flows through the infrastructure Daybreak is analyzing. Vulnerability detection doesn't map sensitive data to identities. It doesn't tell you whether a compromised CI/CD pipeline has access to a production database containing customer PII. It doesn't tell you what an AI agent operating in that environment can reach and synthesize.
These are data governance questions. And they require a different kind of answer.
THE AI AGENT ACCESS PROBLEM
There's a second dimension here that I think is underappreciated in the Daybreak coverage.
Daybreak - like every AI security agent - needs access to your environment to do its job. Codebases, repositories, infrastructure configurations, build pipelines. That access is necessary for the tool to work. And it means that the data those environments contain becomes part of the access footprint of the AI agent operating in them.
Most organizations haven't fully inventoried what sensitive data lives in their development and security infrastructure. Production credentials in configuration files. Customer data in test environments that were never properly cleaned. PII that migrated into a repository through an integration nobody fully audited. This data exists in most large enterprise environments, not because of negligence, but because data accumulates faster than it gets classified.
Before you bring an AI agent into those environments - any AI agent, not just Daybreak - the governance question needs an answer. “What sensitive data is in here, who can reach it, and is that access picture appropriate for an AI system to operate within?
WHAT THIS MEANS FOR SECURITY TEAMS DEPLOYING DAYBREAK
Three things worth doing before or alongside a Daybreak deployment:
First, classify what's in the environments Daybreak will access. Codebases and CI/CD pipelines accumulate sensitive data that isn't always visible in a standard data inventory. Running a classification pass before bringing an AI agent in tells you what's there and what the exposure looks like if that environment is compromised.
Second, map what Daybreak's service account can reach. The blast radius of any compromise - including a compromise of Daybreak itself or a prompt injection against it - is bounded by what its operating identity can access. Scoping that access to the minimum necessary before deployment is the right architecture.
Third, know what patch you're protecting. Daybreak's value is highest when you know which vulnerabilities, if exploited, would expose the most sensitive data. That prioritization requires a continuous, current picture of where sensitive data lives in your environment - so that a critical vulnerability in a system with no sensitive data downstream gets triaged differently from one with a direct path to regulated records.
THE PACE OF THIS IS ACCELERATING
Mythos in April. Daybreak in May. The AI security capability race is compressing timelines for everyone.
Organizations that haven't yet built a continuous, current picture of their sensitive data estate are running out of runway to do it before AI security agents are operating inside their environments. The governance work - classification, identity-to-data mapping, access rationalization - is the foundation that makes all of these tools safer to deploy and more effective when they find something.
Vulnerability tools tell you where the door is. Data security tells you what's in the room. Both questions matter. The pace of the AI security race means you need to be working on both at the same time.
---
FREQUENTLY ASKED QUESTIONS
What is OpenAI Daybreak?
OpenAI Daybreak is a cybersecurity initiative launched May 11, 2026 that combines GPT-5.5 and Codex Security to help organizations identify, validate, and remediate software vulnerabilities. It builds editable threat models from enterprise codebases, validates likely vulnerabilities in isolated environments, and proposes patches for human review. Access is currently limited — organizations must request a vulnerability scan or contact OpenAI sales.
How is Daybreak different from Anthropic Mythos?
Both platforms use frontier AI to find and exploit vulnerabilities — Mythos focuses on autonomous zero-day discovery, while Daybreak is positioned more as a developer-integrated defense platform with a broader partner ecosystem. Anthropic has emphasized restricted access and high-risk vulnerability discovery; OpenAI is taking a broader platform approach tied to enterprise development workflows. Both address the vulnerability discovery question; neither addresses the blast radius question of what data is accessible if a vulnerability is exploited.
What does Daybreak mean for enterprise data security?
Daybreak requires feeding AI agents access to your codebase and infrastructure environments. Before deploying any AI security agent, organizations should classify what sensitive data lives in those environments, map what the agent's operating identity can access, and ensure that access reflects least privilege. The same access that makes these tools effective makes them part of your data attack surface.
What is the blast radius question in cybersecurity?
Blast radius refers to the scope of damage from a successful exploit — specifically, what sensitive data becomes accessible to an attacker who gains a foothold through a vulnerability. Vulnerability tools like Daybreak address how to find and fix vulnerabilities faster. Data Security Posture Management (DSPM) addresses what an attacker reaches if a vulnerability is exploited before it's patched — which is determined by how sensitive data is distributed, classified, and access-controlled across the environment.
How does DSPM complement AI vulnerability tools like Daybreak?
DSPM continuously discovers and classifies sensitive data across cloud, SaaS, and on-premises environments, maps which identities can access sensitive stores, and identifies overpermissioned access. In a Daybreak deployment, DSPM answers three questions: what sensitive data lives in the environments Daybreak will access, what can Daybreak's operating identity reach, and which vulnerabilities are highest priority because they have a direct path to regulated or sensitive data. DSPM and vulnerability management address sequential parts of the same problem — not competing solutions.
Daybreak and Mythos are compressing the vulnerability window on both sides. The organizations best positioned to respond aren't the ones scrambling to understand their data exposure after an exploit — they're the ones who already have a continuous, current picture of what sensitive data lives in their environments, what every identity can reach, and where access needs to be tightened before an AI agent touches it.
See how Sentra maps sensitive data across your cloud, SaaS, and development environments — and what your blast radius actually looks like today. Schedule a Demo →
Read More
Ron Reiter
May 8, 2026
3
Min Read
Data Security
Mythos Is Already Here. The Question Is What Attackers Will Find.
Mythos Is Already Here. The Question Is What Attackers Will Find.
I've spent a lot of time thinking about what Mythos actually changes — and what it doesn't.
The vulnerabilities Mythos found are not new in nature. They're variations of known vulnerability classes — buffer overflows, race conditions, memory corruption. These aren't novel attack categories. What's new is the speed and scale at which Mythos surfaces them. In pre-release testing, Mythos Preview autonomously developed working exploits for Mozilla Firefox vulnerabilities 181 times, compared to the prior model's two successful attempts out of several hundred. That isn't an incremental improvement. It's a different class of capability.
Modeled scenarios show attackers discovering the majority of new vulnerabilities within a few years, meaning defenders increasingly respond to issues adversaries may already know about. The core challenge shifts from finding vulnerabilities faster to fixing them faster.
That's the real strategic shift. And for data security specifically, it has a specific implication that I think is underappreciated in the current conversation.
PATCH SPEED IS NECESSARY. IT'S NOT SUFFICIENT.
When a Mythos-class tool helps an attacker gain initial access — through a zero-day in a browser, an OS, an unpatched server — the next thing that determines outcome is what they find. What data is accessible from the compromised position. What identities and service accounts can be traversed. What sensitive records sit in environments with overly broad permissions.
Most security conversations right now are about accelerating patch cycles, which is the right instinct. A 2025 report found that over 45% of discovered security vulnerabilities in large organizations remain unpatched after 12 months. Closing that gap matters enormously. But patching controls the entry point. It doesn't control the blast radius once someone is in.
The blast radius question is a data question.
WHAT ATTACKERS FIND WHEN THEY GET IN
The uncomfortable truth is that most organizations don't have a comprehensive, current answer to: what sensitive data is accessible from any given position in my environment?
Data accumulates in ways that security teams don't fully track. Salesforce orgs fill up with PII from integrations that nobody audited. Lakehouses absorb years of production data pipelines. Cloud storage buckets get misconfigured and forgotten. Service accounts accumulate permissions that outlive the workflows they were created for. And increasingly, AI agents and copilots run under those service accounts — meaning whatever a service account can reach, the AI can retrieve and synthesize.
This isn't a hypothetical. It's the operational reality in most enterprises I talk to.
When Mythos-class capabilities become more widely available — and Anthropic's own estimate is that similar capabilities will proliferate from other AI labs within six to eighteen months — the attack surface question becomes: not just "what vulnerabilities can be exploited" but "what data becomes accessible when they are." Those are different problems with different solutions.
WHAT ACTUALLY CHANGES YOUR RISK PROFILE
Assume breach. Not as a thought experiment, but as the operating reality it now is.
Given that, the most meaningful thing you can do in the next 90 days isn't buy another scanner. It's get a clear, continuous answer to where your sensitive data actually lives — across cloud, SaaS, data warehouses, and the AI systems layered on top of them — and make sure the access picture reflects least privilege, not accumulated permissions from three years of workflow changes.
That means:
Knowing what's in your environment, continuously. Not a quarterly scan. Not a point-in-time audit. When Mythos-class tools can find and exploit a vulnerability overnight, a quarterly data inventory is operationally useless. You need to know what sensitive data exists and where it lives as a continuous fact, not a periodic report.
Understanding what each identity can reach. The blast radius of any successful exploit is bounded by what the compromised identity — human or service account or AI agent — can access. If that access picture isn't mapped to sensitive data at the record level, you can't assess exposure or contain it quickly after a breach.
Eliminating data that shouldn't be where it is. The most effective way to reduce Mythos-era blast radius is to not have sensitive data sitting in places it doesn't need to be. Redundant copies of regulated records, production data that migrated to dev environments, PII sitting in SaaS tools it arrived in through integration workflows — this is the data that causes the notifications, the regulatory exposure, and the headlines. Getting rid of it before an attacker finds it is categorically better than discovering it during incident response.
THE PART OF THIS CONVERSATION THAT ISN'T GETTING ENOUGH ATTENTION
Most of the Mythos coverage has focused, reasonably, on the vulnerability discovery side. That's where the dramatic capability jump is visible. But the quieter implication is about what happens after discovery and exploitation — which is where data security actually determines outcome.
"The window between a vulnerability being discovered and being exploited by an adversary has collapsed — what once took months now happens in minutes with AI," according to one Project Glasswing partner. If that compression applies equally to time-to-exploit, it applies equally to time-to-data. The faster an attacker can reach a compromised system, the faster they reach whatever's accessible from it.
This is the Mythos implication for data security teams: the window for containment is shrinking, and continuous data visibility is how you make that window matter.
---
FREQUENTLY ASKED QUESTIONS
What is Claude Mythos Preview?
Claude Mythos Preview is an AI model announced by Anthropic in April 2026 capable of autonomously discovering and exploiting zero-day vulnerabilities across every major operating system and browser, at a speed and scale that significantly exceeds human security researchers.
Is Mythos publicly available?
Anthropic has withheld general release, citing offensive risk. Access has been granted to approximately 40 organizations through Project Glasswing, a defensive security consortium. Anthropic estimates comparable capabilities will emerge from other AI labs within 6 to 18 months.
What does "assume breach" mean in a Mythos context?
Assume breach means designing your security posture around the expectation that attackers will get in — focusing less on prevention at the perimeter and more on limiting what they find inside. In a Mythos context, where exploit development can happen overnight, assume breach shifts from a framework to an operating reality.
How does data visibility reduce breach blast radius?
Blast radius — the scope of damage from a successful breach — is determined by what sensitive data is accessible from a compromised position, not by the exploit itself. Organizations with continuous, comprehensive data classification and least-privilege access governance can identify what was exposed quickly and contain the damage. Organizations without it typically discover their exposure during incident response, when it's too late.
What is DSPM and how does it help with Mythos preparedness?
Data Security Posture Management (DSPM) is a continuous monitoring discipline that discovers and classifies sensitive data across cloud, SaaS, and on-premises environments, maps access to that data, and identifies where sensitive records are exposed to over-permissioned identities or misconfigured controls. In a Mythos-era threat model, DSPM provides the continuous data inventory that makes blast radius assessment and containment possible.
Read More
Yair Cohen
May 7, 2026
3
Min Read
Data Security
The Instructure Breach Was Salesforce. Again. Here's the Governance Problem Nobody Is Talking About.
The Instructure Breach Was Salesforce. Again. Here's the Governance Problem Nobody Is Talking About.
ShinyHunters breached Instructure - the company behind Canvas LMS - and claimed 275 million student and teacher records, 3.65 terabytes of data, and a ransom deadline of May 6, 2026. That alone is a significant breach. But the detail buried in the coverage is the more important story for every security team reading this.
This is the second time ShinyHunters has breached Instructure's Salesforce environment. In September 2025, the same group used social engineering to access Instructure's Salesforce instance. Instructure disclosed it, rotated credentials, and continued operating. Eight months later, the same attack surface was breached again.
That is not a story about ShinyHunters' sophistication. It is a story about incomplete remediation and about what happens when a breach response focuses on the credential and the vulnerability without addressing the underlying data exposure.
WHAT SALESFORCE ACTUALLY CONTAINS AND WHY SECURITY TEAMS MISS IT
Most organizations think of Salesforce as a CRM. Their security teams govern it like one - access controls at the application layer, SSO, maybe some DLP on outbound data. What they often don't account for is what accumulates inside Salesforce over years of integrations, workflow automations, and cross-platform data flows.
In the Instructure case, ShinyHunters claims the Salesforce instance contained student and teacher PII, private messages, and institutional records across nearly 9,000 schools. Some of that data flowed into Salesforce deliberately - CRM records, institutional contacts, support tickets. Some of it flowed in through integrations with Canvas that nobody fully audited. All of it was sitting in an environment that, based on the breach timeline, had its access controls reset after September 2025 but was not fundamentally rearchitected.
According to Security Magazine, ShinyHunters has used Salesforce misconfiguration as a repeating attack vector across multiple recent victims - the same playbook behind breaches at McGraw-Hill, Infinite Campus, Amtrak, and ADT. The vector is documented. The pattern is public. And yet organizations continue to treat Salesforce breach response as a credential rotation exercise rather than a data governance exercise.
WHAT "REMEDIATING" A SALESFORCE BREACH ACTUALLY REQUIRES
When a Salesforce environment is breached, the immediate response - revoke credentials, rotate API keys, patch the vulnerability - is necessary. It is not sufficient.
The harder question is: what data was in that Salesforce instance, who could access it, and should it have been there at all? Answering those questions requires classification. Without knowing what sensitive data exists in Salesforce, at the field and record level, there is no way to assess true exposure, implement meaningful least-privilege access, or identify which data flows need to be redesigned.
In Instructure's case, the breach response after September 2025 apparently did not include that step. The data, student PII, private messages, institutional records, remained in the environment, remained broadly accessible, and remained available to ShinyHunters when they returned.
This is the governance gap that keeps breached-and-remediated organizations on the repeat victim list.
THE IDENTITY AND ACCESS DIMENSION
ShinyHunters has also been linked to recent breaches at the University of Pennsylvania, Princeton, and Harvard - all of which share a pattern: large Salesforce deployments, institutional data accumulated over years, access controls managed at the application layer without deep visibility into what sensitive data each identity can actually reach at the data layer.
Sentra's approach to Salesforce governance maps exactly this. Classification runs continuously inside the Salesforce environment - identifying student PII, FERPA-regulated records, private communications, and institutional data that has accumulated through integrations. Access mapping connects each user, service account, and API integration to the sensitive data it can reach - not just the objects it has permissions to access, but the classified sensitive records within those objects. When an integration adds new data flows or permissions drift, the inventory updates in real time.
The output is a continuous answer to the question Instructure's security team could not have answered quickly enough in September 2025: what sensitive data is in Salesforce, what can each identity reach, and what needs to be removed or restricted before the next attempt.
WHAT TO CHECK IN YOUR OWN SALESFORCE ENVIRONMENT THIS WEEK
Three questions worth answering now, regardless of your industry:
First, what sensitive data has accumulated in your Salesforce org through integrations, workflow automations, and cross-platform data flows - beyond what was deliberately put there? Student records, healthcare data, financial records, and private communications all end up in Salesforce through integration patterns that were never evaluated for data sensitivity.
Second, what can each service account and API integration actually reach at the record level? Application-layer access controls in Salesforce do not prevent exfiltration by an attacker who has compromised a sufficiently-permissioned service account. Least-privilege at the data layer requires knowing what sensitive data each identity can access.
Third, if your Salesforce environment were breached today and you had to disclose within 72 hours, could you accurately characterize what data was exposed? FERPA, HIPAA, GDPR, and state-level privacy laws all require specific disclosure of data types. Without continuous classification, the answer in most environments is: not quickly, and not accurately.
FREQUENTLY ASKED QUESTIONS: SALESFORCE DATA SECURITY
What is the ShinyHunters Salesforce attack pattern?
ShinyHunters has repeatedly used Salesforce as an attack vector - typically gaining initial access through social engineering or credential theft, then exfiltrating data from the Salesforce org and using it for extortion. The pattern has appeared in breaches at Instructure (twice), McGraw-Hill, Infinite Campus, Amtrak, and others. The common thread is that Salesforce environments contain far more sensitive data than most security teams have classified or actively governed.
What data typically accumulates in enterprise Salesforce environments beyond CRM records?
In production Salesforce environments, continuous classification commonly surfaces PII from support ticket integrations, regulated financial or health data from cross-platform workflows, private communications stored in custom objects, API credentials and tokens in log fields, and institutional data from education or healthcare integrations. Most of this data arrives through legitimate integration patterns rather than misconfiguration.
How does DSPM apply to Salesforce environments?
Data Security Posture Management applied to Salesforce continuously classifies sensitive data at the field and record level within the Salesforce org; identifying regulated data types, mapping which identities can access them, and flagging access that exceeds least-privilege requirements. This runs inside the customer's environment without data leaving the Salesforce perimeter.
What is the difference between Salesforce's native security tools and DSPM?
Salesforce's native tools - Shield, field-level security, permission sets - control access at the object and field level. They do not classify data by sensitivity, identify regulated records that should not be in a given field, or map the sensitive data reachable by each integration or service account. DSPM fills that gap: it understands what the data is, not just who has permission to access it.
What does FERPA require in the event of an educational data breach?
FERPA requires institutions to protect the privacy of student education records. In a breach involving student PII, private messages, and institutional records - as in the Instructure case - affected institutions face notification obligations, potential loss of federal funding eligibility, and civil liability. Accurate and timely disclosure requires knowing exactly what records were exposed, which requires prior classification.
The Instructure breach happened twice because the data was never classified after the first incident. Credential rotation without data governance leaves the same exposure in place for the next attempt. Sentra continuously classifies sensitive data inside your Salesforce environment at the field and record level, maps what every identity and integration can reach, and flags access that exceeds least-privilege — so your breach response closes the actual gap, not just the credential.
Expert Data Security Insights Straight to Your Inbox
What Should I Do Now:
1
Get the latest GigaOm DSPM Radar report - see why Sentra was named a Leader and Fast Mover in data security. Download now and stay ahead on securing sensitive data.
2
Sign up for a demo and learn how Sentra’s data security platform can uncover hidden risks, simplify compliance, and safeguard your sensitive data.
3
Follow us on LinkedIn, X (Twitter), and YouTube for actionable expert insights on how to strengthen your data security, build a successful DSPM program, and more!
Before you go...
Get the Gartner Customers' Choice for DSPM Report
Read why 98% of users recommend Sentra.
No items found.
This website uses cookies to improve your experience and provide personalized services. See our Privacy Policy and Cookie Policy. We won't track your information unless you accept.