In this article:

This is some text inside of a div block.

Want to actually see your data risks, not just read about them? Book a demo and watch how we discover, classify, and secure sensitive data across your cloud and AI stack in minutes.

Book a demo

Share the Blog

The Blind Spot in Your Data Lake: Why Big Data Format Scanning Is the Next Frontier of Data Security

March 15, 2026

Min Read

Daniel Suissa

R&D Director

Data lakes were supposed to be the great democratizer of enterprise analytics. Centralized, scalable, and cost-effective, they promised to put data in the hands of every team that needed it. And they delivered -- perhaps too well. Today, petabytes of sensitive data sit in Apache Parquet files, Avro containers, and ORC stores across S3 buckets, Azure Data Lake Storage, and Google Cloud Storage, often with little to no visibility into what those files actually contain.

‍

Traditional Data Loss Prevention (DLP) tools were built for a world of emails, PDFs, and spreadsheets. They have no understanding of columnar storage formats, embedded schemas, or the sheer scale of modern data lake architectures. That gap is where sensitive data hides in plain sight -- and where Sentra's data lake format scanning changes the equation entirely.

The Shadow Data Problem in Data Lakes

‍

Every modern enterprise runs some version of the same playbook: production databases feed into ETL pipelines, which land data in object storage as Parquet, Avro, or ORC files. Data engineers, analysts, and machine learning teams then consume that data downstream.

‍

The security problem is straightforward but pervasive. When data engineering teams copy production data into data lakes for analytics, the PII that was supposed to be masked or anonymized often arrives intact. A full copy of customer records -- Social Security numbers, credit card numbers, health information -- ends up in a Parquet file in a shared S3 bucket, accessible to anyone with the right IAM role.

‍

This is not a hypothetical scenario. It is the default state of most enterprise data lakes. And with data democratization initiatives actively expanding access to these stores, the blast radius of unprotected data lake files grows with every new user who gets read permissions.

Why Traditional DLP Falls Short

‍

Conventional DLP solutions treat files as opaque blobs of text. They can scan a CSV or a Word document, but hand them an Apache Parquet file and they see nothing. This is a fundamental architectural limitation, not a feature gap that can be patched.

‍

Big data formats are structurally different from traditional file types. Parquet and ORC use columnar storage, meaning data is organized by column rather than by row. Avro embeds its schema directly in the file. Arrow IPC (Feather) uses an in-memory format optimized for zero-copy reads. Scanning these formats requires purpose-built readers that understand their internal structure -- readers that traditional DLP simply does not have.

‍

The result is a compliance blind spot that grows larger every quarter as more data moves into lakehouse architectures powered by Databricks, Snowflake external tables, and similar platforms.

How Sentra Scans Big Data Formats

‍

Sentra provides native, schema-aware scanning for the full spectrum of data lake file formats. This is not a bolt-on capability -- it is core to how our platform understands modern data infrastructure.

Apache Parquet

‍

Parquet is the lingua franca of the modern data lake. Sentra's tabular reader processes Parquet files with full awareness of their columnar structure, performing intelligent column-level classification. Rather than brute-forcing through every byte, Sentra leverages the columnar layout to efficiently scan individual columns for sensitive data patterns. Batch processing support means even large Parquet datasets are handled without requiring the entire file to be loaded into memory at once. Sentra also recognizes Spark checkpoint files (the `c000` convention) and processes them via Parquet or JSON fallback, ensuring that intermediate pipeline outputs do not escape scrutiny. Sentra also goes beyond the parquet schema and detects nested schemas like a json column that hides behind a “string” data type, adding meaningful context to the classification engine.

Apache Avro

‍

Avro files carry their schema with them, and Sentra takes full advantage of that. Our tabular reader parses the embedded schema to understand field names, types, and structure before scanning the data itself. This schema-aware approach enables more accurate classification -- a field named `ssn` containing nine-digit numbers is treated differently than a field named `zip_code` with the same pattern.

Apache ORC

The Optimized Row Columnar format is a staple of Hive-based data warehouses and remains widely used across Hadoop-era data infrastructure. Sentra's tabular reader handles ORC files natively, applying the same column-level classification intelligence used for Parquet and Avro.

Apache Feather / Arrow IPC

Arrow's IPC format (commonly known as Feather) is increasingly used for fast data interchange between Python, R, and other analytics tools. Sentra scans these files through its textual reader, ensuring that even ephemeral interchange formats do not become a vector for untracked sensitive data.

Column-Level Intelligence

Across all of these formats, Sentra performs column-level scanning and classification. This is critical at data lake scale. A single column in a petabyte Parquet dataset could contain millions of Social Security numbers, while every other column holds benign operational metrics. Column-level granularity means Sentra can pinpoint exactly where sensitive data lives, rather than simply flagging an entire file as "contains PII."

The Compliance Imperative

Regulatory frameworks do not carve out exceptions for big data formats. GDPR's right of access and right to erasure apply regardless of whether personal data is stored in a PostgreSQL table or a Parquet file in S3. CCPA's disclosure requirements extend to every copy of consumer data, including the one sitting in your analytics data lake.

‍

Data Subject Access Requests (DSARs) are particularly challenging when sensitive data is spread across thousands of Parquet files in a data lake. Without automated scanning that understands these formats, responding to a DSAR becomes a manual archaeology project -- expensive, slow, and error-prone.

‍

The AI governance dimension adds another layer of urgency. Machine learning training datasets are frequently stored in Parquet format. If those datasets contain PII that was used to train models, organizations face regulatory exposure under emerging AI governance frameworks. Knowing what personal data exists in your ML training pipelines is no longer optional -- it is a compliance requirement that is rapidly taking shape across jurisdictions.

From Blind Spot to Full Visibility

The shift to data lakehouse architectures is accelerating. Databricks, Snowflake, and the broader modern data stack have made it easier than ever to store and process massive volumes of data in open file formats. That is a net positive for analytics and engineering teams. But without security tooling that speaks the same language as the data infrastructure, sensitive data will continue to accumulate in places where no one is looking.

‍

Sentra closes that gap. By providing native, schema-aware scanning for Parquet, Avro, ORC, Feather, and related formats -- combined with intelligent column-level classification and efficient batch processing -- Sentra gives security and compliance teams the visibility they need into the fastest-growing data stores in the enterprise.

‍

Data lakes are not going away. The question is whether your security posture can keep up with the data engineering teams that feed them. With Sentra, the answer is yes.

‍

*Sentra is a Data Security Posture Management (DSPM) platform that automatically discovers, classifies, and monitors sensitive data across your entire cloud environment. To learn more about how Sentra handles data lake scanning and 150+ other file formats, book a demo with our data security experts.

<blogcta-big>

‍

Daniel Suissa

R&D Director

Daniel is an R&D Director at Sentra. He has over a decade of experience in engineering, and in the cybersecurity sector. He earned his BSc in Computer Science at NYU.

Latest Blog Posts

Yair Cohen

April 27, 2026

Min Read

Sentra Q2 2026 Product Updates: Data Security in the Age of AI

Every quarter I get asked some version of the same question: "What's the biggest shift you're seeing in enterprise data security right now?" My answer hasn't changed in the past year, but the urgency behind it keeps growing.

‍

AI is no longer a side project. Copilots, agents, and LLM-powered apps are spinning up across Microsoft 365, AWS, Databricks, Azure, and beyond; often faster than security teams can track. At the same time, most large enterprises still have critical regulated data living on file shares and databases in their own data centers, largely invisible to cloud-first tools. And the DLP stacks organizations spent years building? They're only as smart as the labels and context they can see, which, for most companies, isn't very much.

‍

These aren't new problems. But they've collided in a way that makes 2026 a genuinely pivotal year for data security. Read this post (or watch the on-demand webinar) for a walk through of what we shipped in Q2 and where we're taking Sentra for the rest of the year.

The Three Problems We Kept Hearing

Before I walk through our Q2 updates, it's worth naming the friction points that drove them. Across our customer conversations, three questions kept coming up without clean answers:

‍

"What AI assets do we actually have, and what data do they touch?" Organizations know they're deploying copilots and agents. They often have no unified view of what those assets are connected to.

‍

"We have critical data on-prem that never moved to the cloud. What do we do about it?" Almost every large enterprise we work with still has regulated data sitting in data centers. Historically, the choices were. 1) ignore it, 2) try to move it to the cloud just to scan it, which is usually a non-starter for compliance and operations.

‍

"Our DLP stack isn't working the way it should. Is that a classification problem?" Almost always, yes. Enforcement agents, whether it's Microsoft Purview, Google DLP, SASE, CASB, or endpoint DLP, are only as good as the labels and context they see. If data isn't classified accurately and consistently, policies either never trigger or they trigger constantly and generate noise.

These three problems shaped our Q2 investments directly.

‍

Q2 Update #1: AI Security - Turning AI Chaos Into a Governable Surface

The real risk with enterprise AI isn't the models themselves. It's that no one has a clean answer to three basic questions: What AI assets do we have? What data do they touch? And are they using that data in a way that would pass an audit?

‍

In Q2, we took the first concrete step toward answering all three.

‍

Unified AI Asset Inventory. We now give you a single view of your agents, models, and endpoints - with owners and environments - instead of having them scattered across different consoles. If you're running Copilot in M365, SageMaker models on AWS, and custom agents on Bedrock or Azure, they all show up in one place.

‍

Data Lineage Into AI. For each agent, we map which knowledge bases and data stores it relies on and roll up the sensitive data classes and business context to the AI asset level. This is the part that matters most. Until now, people thought about data security in terms of how employees accessed files and permissions. With GenAI, data flows much faster through agents, so understanding the data at rest, and which AI assets touch it, is the critical control point.

‍

Govern Data Use in AI. Once you have that lineage, you can start making real policy decisions. These are the data classes we're comfortable using for copilots and agents; these are the ones that must never be touched. We flag high-risk agents, those with access to regulated data or broad permissions, before they roll out, not after something leaks.

This is the first step toward our broader 2026 AI readiness vision: treating AI assets the same way we treat any other sensitive data store, with inventory, lineage, posture assessment, and policy enforcement. The goal is that when your organization wants to move faster with GenAI, Sentra gives you the map, the policies, and the evidence you need to say yes - safely.

Q2 Update #2: On-Prem & Hybrid Coverage - Securing the Data That Never Moved to the Cloud

Almost every large enterprise we work with still has critical regulated data on file shares and databases in their own data centers. It's often the riskiest and least visible part of the estate.

‍

In Q2, we introduced local on-premise scanners that run inside your environment, scan file shares and data stores where they live, and send us only the metadata and classifications, not the sensitive data itself. You get the same AI-powered discovery, classification, sensitivity mapping, and posture analytics you're used to in cloud and SaaS. Your data never leaves your data center.

‍

"How realistic is full coverage?" - very realistic. We essentially took the technology we built for our cloud scanners and packaged it for any private data center or on-premise environment. We ship lightweight local scanners, support all types of SMB and NFS file shares, and cover databases including MySQL, Oracle, Postgres, and more. Sentra also connects to your Active Directory to map access levels across identities, file shares, and databases.

‍

All of that feeds into a single map across on-prem, cloud, and SaaS, so security teams can finally reason about all their sensitive data everywhere, instead of managing separate point solutions for each island. And critically, this isn't a POC exercise. We focused on easy, secure deployment; lightweight collectors, quick rollout, and alignment with enterprise network and security requirements. This is something you can actually put into production.

Q2 Update #3: Automatic Labeling & Tagging - Making Your Existing DLP Stack Actually Smart

Most organizations aren't looking to rip and replace their DLP stack. The real pain is that enforcement is flying blind. DLP, SSE, CASB, and endpoint tools are like muscles without a brain. They can be powerful, but only if the underlying classification is accurate and consistent.

‍

Sentra's role is to be the data security and classification brain that makes those existing tools actually smart.

In Q2, we doubled down on cross-platform auto-labeling. Automatically applying Microsoft Purview Information Protection (MPIP) labels in M365 and Google sensitivity labels in Google Drive, based on our high-accuracy discovery and classification. Those labels then become the control plane for everything downstream; email DLP, endpoint and web proxies, SaaS DLP, and even AI and Copilot controls that decide which data can be surfaced in responses.

Instead of authoring hundreds of brittle regex rules, you're keying policies off rich business context; HR compensation documents, customer financial statements, high-sensitivity intellectual property. The result is fewer false positives, better enforcement, and a classification foundation that scales.

‍

Strategically, this is how we move from DSPM-plus-alerts to cloud-native DLP and automated remediation at scale. Sentra discovers and understands the data, stamps it with the right labels, and your existing enforcement stack, plus our own remediation, ensures data is only used, shared, and accessed in ways that match its true sensitivity.

Classification Is Still the Core of Everything

One thing I want to leave you with, because I don't think it gets said enough: classification is the foundation that makes all of this work. It's still where we invest the most at Sentra, and with advances in AI, we're making our capabilities more ambitious and more automatic.

‍

We're building classifiers that are specific to each organization's proprietary data. Sentra learns your specific environment, and for every piece of data found, whether it's a file, a column, or a table, we know what it is and what its business context means. Beyond that, we're evolving our sensitivity scoring engine so security teams can bring their own definitions of what's sensitive, and our engine automatically translates that using AI into rules that ensure every piece of data gets the right label.

‍

The goal is to make the effort of classifying and labeling data as easy as describing it to another human being. And to remove the manual research and validation work that doesn't scale in the AI era.

The Bottom Line

The challenge of enterprise data security in 2026 isn't a lack of tools. It's that the tools organizations have - DLP, CASB, SSE, endpoint controls - are only as effective as the data intelligence feeding them. At the same time, AI is creating an entirely new attack surface that most security teams can't see clearly yet. And on-premise data, the part of the estate that never moved to the cloud, remains the riskiest and least visible.

‍

Sentra is building toward a single platform that addresses all three: a data-first security platform that discovers your critical data, understands its context, and drives the controls in your existing tools and in ours, so data stays safe, compliant, and usable for the business.

‍

We'll see you next quarter with more updates. In the meantime, reach out if you have questions or schedule a demo if you want to go deeper on any of this.

‍

Team Sentra

April 24, 2026

Min Read

AI and ML

Patchwork AI Security vs. Purpose-Built Protection: Thoughts on Cyera’s Ryft Acquisition

Yesterday’s news that Cyera is acquiring Ryft, a two-year-old startup building automated data lakes for AI agents, is the latest sign of how fast the agentic AI security market is moving. It’s also Cyera’s fourth acquisition in five years, on the heels of Trail Security and Otterize, a clear signal that the company is trying to buy its way into new narratives as quickly as they emerge.

‍

For security and data leaders, the question isn’t “Is agentic AI important?” It absolutely is. The question is: What’s the real cost of stitching together yet another acquisition into an already complex platform?

The hidden cost of rapid, piecemeal integrations

On paper, adding Ryft gives Cyera a new story around “agentic AI security.” In practice, it creates a familiar set of integration problems:

‍

Multiple architectures to reconcile
Trail Security, Otterize, and now Ryft were all built as independent products with their own data models, UX patterns, and engineering roadmaps. Four acquisitions in five years means customers are effectively buying an integration project that’s still in progress, not a single, mature platform.

‍

Gaps, overlaps, and inconsistent controls
Every acquired module has its own blind spots and strengths. Until they’re truly unified, you get overlapping coverage in some areas, gaps in others, and policy engines that don’t behave consistently across cloud, SaaS, and on-prem.

‍

Slower time-to-value for AI initiatives
AI programs move quickly; integrations do not. Each acquisition has to be wired into discovery, classification, policy, reporting, access control, and remediation workflows before it delivers real value. That’s measured in quarters and years, not weeks.

‍

Operational drag on security teams
When you tie together multiple acquired engines, you often see scan-based coverage, noisy false positives, and limited self-serve reporting that still depends on the vendor’s team to interpret results. That’s the opposite of what already stretched security teams need as they take on AI data risk.

‍

The Ryft deal fits this pattern. It’s a high-priced bet on an early-stage team with a small set of digital-native customers, not a proven, enterprise-scale AI data security engine. That’s fine as a venture bet. It’s more problematic when packaged as an answer for Fortune 500 AI governance.

Why agentic AI security can’t be bolted on

Agentic AI changes the risk profile of enterprise data:

‍

Agents traverse structured and unstructured data across cloud, SaaS, and on-prem.
They act on behalf of identities, often chaining tools and APIs in ways that are hard to predict.
The blast radius of a misconfiguration or over-permissioned identity grows dramatically once agents are in the loop.

‍

Trying to solve that by bolting an AI data lake acquisition onto a legacy, scan-based DSPM engine is risky. You’re adding another moving part on top of a system that already struggles with:

‍

Point-in-time scans instead of real-time, continuous coverage
High false positives without strong prioritization
Shallow support for hybrid and on-prem environments
Vendor-controlled workflows instead of customer-controlled, self-serve reporting

‍

If the underlying platform can’t continuously understand where sensitive data lives, which identities can touch it, and how that access is used, then adding an “AI data lake” on the side doesn’t fix the fundamentals. It just adds another place for risk to hide.

A different path: Sentra’s purpose-built, real-time platform

At Sentra, we took a different approach from day one: build a single, in-place, real-time data security platform, not a patchwork of stitched-together acquisitions.

‍

A few principles guide the way we think about AI and data security:

‍

Real-time, continuous data security
Sentra monitors data continuously, not in point-in-time scans, so risks are caught as they happen - not at the next scheduled review.

‍

One unified architecture
Sentra is a purpose-built, unified platform, not an assortment of logos held together by integration roadmaps. There’s one architecture, one data model, one roadmap, and one team focused entirely on DSPM and AI data security, rather than a set of acquired point products that still need to be woven together.

‍

Proven for real AI workloads today
Our platform is already securing real AI workloads in production environments, rather than depending on the future maturation of a seed-stage acquisition. AI data security for us is not a sidecar story. It's built into how we discover, classify, govern, and remediate risk across your estate.

‍

Higher-precision signal, not more noise
Sentra delivers higher classification precision (4.9 vs. 4.7 stars on Gartner) and couples that with workflows your team controls, not processes that require vendor intervention every time you need a new report or policy tweak.

‍

Complete coverage for complex environments
Modern enterprises aren’t cloud-only. Sentra provides full coverage across IaaS, PaaS, SaaS, and on-premises from a single platform, built for hybrid and legacy-heavy environments as much as for cloud-native stacks.

‍

In other words, while some vendors are racing to acquire their way into the next AI buzzword, Sentra is focused on delivering trustworthy, real-time, identity-aware data security that you can put in front of a CISO and a data platform owner today.

What to ask your vendors now

If you’re evaluating Cyera (or any vendor riding the latest AI acquisition wave), a few concrete questions can cut through the noise:

‍

How many acquisitions have you done in the last five years, and which parts of my deployment depend on those integrations actually working?
What’s fully integrated and running in production today vs. what’s still on the roadmap?
Are my AI and non-AI data risks handled by the same platform, policies, and reporting, or by separate acquired modules?
Do you provide continuous coverage and identity-aware controls across cloud, SaaS, and on-prem, or am I still relying on periodic scans and partial visibility?

‍

The AI security market doesn’t need more logos; it needs fewer moving parts, better signals, and real-time control over how data is used by humans and agents alike.

‍

That’s the standard Sentra is building for and the lens through which we view every new acquisition announcement in this space.

‍

Ron Reiter

April 24, 2026

Min Read

Data Security

Sentra Now Supports Solidworks 3D CAD Files – Protecting the Digital Blueprint in the Age of AI

Walk into any advanced manufacturing, aerospace, defense, or industrial design shop and you’re just as likely to see Solidworks as you are AutoCAD. The models, assemblies, and drawings built in Solidworks are the digital blueprints for everything from turbine blades and medical devices to satellites and weapons systems.

‍

Earlier this year we announced native support for AutoCAD DWG files, making an entire class of previously opaque CAD data visible to security and compliance teams for the first time. Now we’re extending that same deep visibility to Solidworks 3D CAD files, so you can protect the IP and regulated technical data hiding inside your .sldprt, .sldasm, and related content—without slowing engineering down.

‍

And as AI accelerates design cycles, that visibility is no longer optional.

‍

AI is Supercharging Design – and Expanding the Blast Radius

Design teams are pushing faster than ever:

‍

Generative design tools propose entire families of parts and assemblies.
Copilots summarize requirements, suggest changes, and draft documentation off CAD models.
PLM-integrated agents automatically create downstream artifacts—quotes, NC programs, service manuals—based on 3D designs.
RAG-style internal assistants answer questions using a mix of project docs, CAD files, and simulation outputs.

‍

All of this is powerful. It also multiplies the ways sensitive CAD data can leak:

‍

Entire assemblies uploaded to unmanaged AI tools “just to explore options.”
Export-controlled models referenced in prompts and ending up in long‑lived AI data lakes.
Supplier and customer CAD shared into external copilots with little visibility into who—or what agent—can access it.
Rich metadata from CAD (usernames, project codes, server paths, partner names) silently turned into reconnaissance material.

‍

If you don’t understand what’s inside your CAD, where it lives, and which identities and AI agents can reach it, AI doesn’t just speed up design—it speeds up IP disclosure, compliance failures, and supply‑chain exposure.

‍

CAD Has Been a Blind Spot for Security

Most traditional DSPM and DLP tools still treat specialized engineering formats as a big binary blob: “probably sensitive, treat with caution.” That may have been acceptable when CAD lived on a handful of on‑prem engineering servers.

‍

It’s not acceptable when:

‍

Decades of CAD history have been lifted and shifted into S3, Azure Blob, or SharePoint.
ITAR/EAR “technical data” now lives side‑by‑side with everyday project files in cloud object stores.
Those same repositories feed downstream systems—PLM, MES, AI assistants—where traditional security tools have little or no visibility.

‍

We built native DWG parsing into Sentra to break that stalemate, making CAD content as transparent to security teams as a Word document. Solidworks 3D CAD support is the next logical step.

‍

What’s Really Inside a Solidworks 3D CAD File?

Like DWG, a Solidworks file is far more than geometry. It’s a container for rich metadata, text, and structural context that describes both what you’re building and how it fits into regulated programs and commercial IP. Our Solidworks support is designed to surface that security‑relevant context—without requiring CAD tools, manual exports, or data movement.

‍

Similar to what we do for DWG, Sentra can extract and analyze key elements, including:

‍

Document properties
Authors, “last saved by,” creation and modification timestamps, total editing time, and revision counters—signals that help you understand who is touching sensitive designs and when.

‍

Custom properties and configuration metadata
Project IDs, part and assembly numbers, revision codes, program names, business units, and export‑control or classification markings encoded as custom properties or notes.

‍

Text content and annotations
Notes, callouts, PMI, and embedded text that often contain material specifications, tolerances, customer names, contract IDs, and phrases like “COMPANY CONFIDENTIAL,” “EXPORT CONTROLLED,” or ITAR statements.

‍

Assembly structure and component names
Which parts roll up into which assemblies, and how those components are named—critical when you need to understand which physical systems a given sensitive model belongs to.

‍

File dependencies and paths
References to drawings, configurations, libraries, and external resources that routinely expose server names, share paths, usernames, and department structures—goldmine context for attackers, but also for incident response and insider‑risk investigations.

‍

For organizations operating under ITAR and EAR, this is where truly export‑controlled technical data actually lives—not in the folder name, but in the title blocks, annotations, and metadata attached to models and drawings.

‍

Turning Solidworks Models into Actionable Security Signals

By parsing Solidworks 3D CAD files in place, inside your own cloud accounts or VPCs, Sentra can now treat them as first‑class citizens in your data security program—just like we do for DWG and other specialized formats.

‍

That unlocks concrete use cases, such as:

‍

Finding export‑controlled or highly sensitive designs in cloud storage
Automatically surface Solidworks files whose metadata, annotations, or custom properties contain ITAR statements, ECCN codes, proprietary markings, or customer‑confidential labels—so you can focus remediation on the drawings and models that are actually regulated.

‍

Mapping who (and what) can access critical designs
Combine CAD‑aware classification with Sentra’s DSPM and DAG capabilities to answer:
Where are our most sensitive Solidworks assemblies stored, and which identities, service principals, and AI agents can currently reach them?

‍

Monitoring AI and collaboration workflows for IP exposure
Track when Solidworks files that contain regulated or high‑value IP are moved into AI data lakes, shared via collaboration platforms, or accessed by non‑human identities—so DDR policies can flag, quarantine, or route for review before they turn into public incidents.

‍

Building a defensible audit trail for CAD‑resident technical data
Maintain an inventory of Solidworks files that contain export‑control markings or IP‑critical content, tie each file to its exact storage location and access controls, and surface any out‑of‑policy placements—so when auditors ask “Where is your technical data?”, you can answer with data, not slideware.

‍

Closing the Gap Between “Stored” and “Understood” for 3D CAD

As workloads like EDA, PLM, simulation, and AI‑assisted design move deeper into the cloud, the number of specialized formats in your environment explodes. Most tools still only truly understand emails, office documents, and a narrow slice of structured data.

‍

The reality is simple: you cannot secure data you don’t understand. Understanding means being able to answer, at scale, not just “Where is this file?” but “What is inside this file, how sensitive is it, and how is AI amplifying its risk?”

‍

For organizations whose crown‑jewel IP and export‑controlled technical data live in Solidworks 3D CAD, that’s the gap Sentra is now closing.

‍

If you want to see what’s actually hiding inside your own Solidworks models and assemblies, the easiest next step is to run a focused assessment: pick a few representative buckets or repositories, let Sentra scan those CAD files in place, and review the inventory of regulated and high‑value designs that surfaces.

‍

Chances are, once you’ve seen that map—and how it connects to your AI initiatives—you’ll never look at “just another CAD file” the same way again.

‍

Expert Data Security Insights Straight to Your Inbox

What Should I Do Now:

Get the latest GigaOm DSPM Radar report - see why Sentra was named a Leader and Fast Mover in data security. Download now and stay ahead on securing sensitive data.

Sign up for a demo and learn how Sentra’s data security platform can uncover hidden risks, simplify compliance, and safeguard your sensitive data.

Follow us on LinkedIn, X (Twitter), and YouTube for actionable expert insights on how to strengthen your data security, build a successful DSPM program, and more!

The Blind Spot in Your Data Lake: Why Big Data Format Scanning Is the Next Frontier of Data Security

The Shadow Data Problem in Data Lakes

Why Traditional DLP Falls Short

How Sentra Scans Big Data Formats

Apache Parquet

Apache Avro

Apache ORC

Apache Feather / Arrow IPC

Column-Level Intelligence

The Compliance Imperative

From Blind Spot to Full Visibility

Latest Blog Posts

Sentra Q2 2026 Product Updates: Data Security in the Age of AI

Sentra Q2 2026 Product Updates: Data Security in the Age of AI

The Three Problems We Kept Hearing

Q2 Update #1: AI Security - Turning AI Chaos Into a Governable Surface

Q2 Update #2: On-Prem & Hybrid Coverage - Securing the Data That Never Moved to the Cloud

Q2 Update #3: Automatic Labeling & Tagging - Making Your Existing DLP Stack Actually Smart

Classification Is Still the Core of Everything

The Bottom Line

Patchwork AI Security vs. Purpose-Built Protection: Thoughts on Cyera’s Ryft Acquisition

Patchwork AI Security vs. Purpose-Built Protection: Thoughts on Cyera’s Ryft Acquisition

The hidden cost of rapid, piecemeal integrations

Why agentic AI security can’t be bolted on

A different path: Sentra’s purpose-built, real-time platform

What to ask your vendors now

Sentra Now Supports Solidworks 3D CAD Files – Protecting the Digital Blueprint in the Age of AI

Sentra Now Supports Solidworks 3D CAD Files – Protecting the Digital Blueprint in the Age of AI

AI is Supercharging Design – and Expanding the Blast Radius

CAD Has Been a Blind Spot for Security

What’s Really Inside a Solidworks 3D CAD File?

Turning Solidworks Models into Actionable Security Signals

Closing the Gap Between “Stored” and “Understood” for 3D CAD

Get the Gartner Customers' Choice for DSPM Report