All Resources
In this article:
minus iconplus icon
Share the Blog

Thoughts on Sentra and the Data Security Landscape After Our Series A

February 1, 2023
3
Min Read

By Asaf Kochan, Co-Founder and President, Sentra

Series A announcements are an exciting time for any startup, and we look forward to working with our new investors from Standard Investments, Munich re Ventures (MRV), Moore Strategic Ventures, and INT3 to grow the cloud data security and data security posture management (DSPM) categories. 

I wanted to take a moment to share some of my thoughts around what this round means for Sentra, cloud data security, and the growth of the DSPM category as a whole. 

Seeing is Believing: From Potential Customer to Investor

The most amazing part of this round is that we didn’t originally intend to raise money. We approached Standard Industries as a potential customer not an investor. It was incredible to see how bought-in the team was to Sentra’s approach to data security. They understood instantly the potential for securing not only Standard’s data, but the data of every cloud-first enterprise. The total addressable market for data security solutions is already large, and it’s growing every year, as more and more new companies are now cloud-native. The global need for solutions like Sentra was obvious to their team after seeing the product, and I’m excited to have a forward-thinking investor like Standard as part of our journey. 

It’s a Vote of Confidence in the Sentra Team and Product

Any Series A is first and foremost a vote of confidence. It’s an endorsement of the vision of the company, the approach the product is taking, and the potential of the core team to continue to grow the business. Anyone who has spoken with our talented team understands the level of expertise and perseverance they bring to every task, meeting, and challenge. I’m proud of the team we’ve built, and I’m excited to welcome many new Sentrans to the team in the coming months. 

As I mentioned, the round is also a mark of confidence of the development and direction of the product itself. Working with our existing customers, we’ve regularly added new features, integrations, and capabilities to our solution. As we continue to discover, classify, and secure larger amounts of data in all cloud environments, the benefits of a data centric approach become clear. We’ve successfully reduced the risks of catastrophic data breaches by reducing the data attack surface, improved relationships between engineering and security teams by breaking down silos, and even helped our customers reduce cloud costs by finding and eliminating duplicate data. 

Data Security is a Must, Not a Nice to Have

Raising money in the current economic climate is not to be taken for granted. The significant investment in Sentra’s vision speaks not only to the value provided by Sentra’s product and team, but also to how critical data security has become. Compliance, privacy, and security concerns are present regardless of how the NASDAQ is performing.

Certainly we’re seeing no slowdown in data security regulations. Global enterprises are now responsible for ensuring compliance with a growing number of data regulations from different government and commercial sources across the globe.  When it comes to security and IP protection, the threat of a catastrophic data breach is top of mind for all cloud security teams. As the reality sets in that breaches are not a matter of “if” but “when”, the logic of the data centric approach becomes clear: If you can’t prevent the initial breach, you can make sure your most sensitive data always has the proper security posture. In the future we’re building, not every breach will be newsworthy, because not every breach will involve sensitive data. This funding round demonstrates the growing acceptance that this is the direction cloud security is and should be heading. 

DSPM will Come to Dominate Cloud Security

There’s always some skepticism in the cyber world when a new category is created. 

  • Is the problem it claims to solve really that serious?
  • Can we just use existing paradigms and tools to address it?
  • Is implementing a new tool going to make a real difference for the business? 

These questions are valid, and any cyber company operating in a new space must address them forthrightly and clearly. We have been clear from the beginning - a data centric approach to security with DSPM is not a small step, but a giant leap forward. Data is the core asset of most companies, and that asset is now stored in the cloud. Old approaches will not be sufficient. This new round is led by investors who recognize this new reality and share our vision that we need to put data at the core of cloud security strategies.

I want to end by again emphasizing how thankful I am for having amazing investors, partners, and team members join us over the last 18 months. So much has been accomplished already, but the industry shift to data centric security has only just begun. I’m looking forward to continuing to protect the most important business asset in the world - our data.

For four years, he was the commander of Israel’s Unit 8200, leading the world's most innovative cyber team. During those years, he saw threat actors continuously exploit sensitive data that was improperly secured, and worked to mitigate the damage this was causing to both the public and private sectors. Reflecting on these experiences, it was clear to him that sensitive data had become the most important asset in the world. In the private sector, enterprises that were leveraging data to generate new insights, develop new products, and provide better experiences, were separating themselves from the competition. As data becomes more valuable it becomes more of a target. And as the amount of sensitive data grows, so does the importance of finding the most effective way to secure it. That’s why he co-founded Sentra, together with accomplished co-founders Yoav Regev, Ron Reiter, and Yair Cohen.

Subscribe

Latest Blog Posts

Adi Voulichman
Adi Voulichman
March 16, 2026
3
Min Read

SQLite and SQL Dump Scanning for Data Security: Finding the Database Copies You Forgot

SQLite and SQL Dump Scanning for Data Security: Finding the Database Copies You Forgot

Most organizations are good at locking down production databases. Firewalls, IAM, encryption at rest, the basics are covered. The real problem is everything that isn’t “production” anymore.

If you want a realistic view of your risk, you need SQLite and SQL dump scanning for data security. These forgotten files are complete, queryable replicas of your most sensitive data, and they’re often sitting far outside your governance perimeter.

Database Files Are Everywhere and They Contain Everything

Two patterns show up over and over again.

SQLite: The Invisible Database on Every Device

SQLite is the most widely deployed database engine in the world. It underpins mobile apps on iOS and Android, every major web browser, countless desktop applications, and an increasing number of IoT devices. Because it is embedded directly into software rather than running as a separate server, SQLite databases quietly accumulate everywhere - from device backups to application data directories, often without organizations realizing how much sensitive information they contain.

A single .sqlite or .db file extracted from a mobile app backup can contain full user profiles, private messages, health records and telemetry, location histories, and transaction logs. Unlike unstructured files, this data is already organized into tables and indexed for fast queries, making it trivial to explore once the file is accessed, whether by a legitimate analyst or by an attacker who discovers the database outside the organization’s governance controls. All structured, indexed, and easy to query for you or for an attacker.

SQL Dump Files: Production in a File

Files like .sql, .dump, .pgdump are not simple exports. They are:

  • Schema definitions
  • Data rows
  • Stored procedures
  • Sometimes even credentials in comments

They’re created for migrations, debugging, staging, analytics, or “just in case” backups. Then they get uploaded to S3, dropped on a shared drive, left on a laptop, or stored in CI/CD artifacts. If you read breach reports over the past decade, a recurring pattern appears: an unprotected SQL dump in cloud storage, found by an attacker before it’s found by the organization.

Why Traditional DLP Can’t Handle SQLite and SQL Dumps

Legacy DLP tools treat files as text streams to scan with regexes. That model breaks down for databases:

  • SQLite databases are binary structured files with internal tables and indexes, not flat text.
  • SQL dumps are scripts that reconstruct a database, not CSV files.

If you scan a .db file as raw bytes, you’ll miss most of what’s inside. If you grep an entire 50 GB SQL dump for patterns, you’ll drown in noise and still struggle to map findings back to specific tables or columns.

Meaningful inspection requires treating these as databases:

  • Enumerating tables
  • Parsing schemas
  • Extracting rows
  • Classifying data at the column level in context

That’s a different problem than scanning PDFs.

How Sentra Approaches SQLite and SQL Dump Scanning

At Sentra, we treat database files as first‑class structured data sources, not opaque blobs.

SQLite, SQLite3, and .db Files

When Sentra encounters a SQLite database, our SQLiteReader:

  1. Enumerates every table and its schema.
  2. Extracts rows in a structured, tabular form.
  3. Preserves column names and types so classifiers can make context‑aware decisions.

A column named ssn with nine‑digit values is clear. A diagnosis_code next to patient_id tells another story entirely.

All processing happens in memory. We never write the database contents to disk on Sentra’s side, which avoids creating new uncontrolled copies of sensitive data.

SQL Dump Files

SQL dumps present a different challenge. They’re scripts, not binary databases.

Sentra’s SQLReader:

  • Parses SQL dump files directly, without executing them
  • Supports common SQL dialects (for example PostgreSQL, MySQL, ANSI SQL)
  • Extracts CREATE TABLE and INSERT statements to reconstruct tabular structures in memory

That allows you to scan, for example, a 50 GB PostgreSQL dump in S3 and identify every table that contains PII, payment data, or PHI - without provisioning a database server, executing untrusted SQL, or moving the file out of your environment.

Tabular Extraction and Contextual Classification

Both SQLite and SQL dumps go through Sentra’s tabular extraction mode. Instead of treating them as unstructured text, we preserve relationships between tables, columns, and values.

That yields:

  • Far fewer false positives than blind pattern‑matching
  • The ability to catch issues that only show up in context

For example, a “tokenized” column that only becomes PCI‑relevant when you notice an adjacent column of unmasked PANs.

Compliance: You Can’t Delete What You Can’t Find

Untracked database copies are not just a security risk; they’re a compliance problem.

  • GDPR (Right to Erasure): If a user’s data lives in an old .sql export on a cloud drive, your deletion is incomplete. You’re still on the hook.
  • PCI DSS: Cardholder data in a database dump outside your CDE expands your audit scope and can cost you certification.
  • HIPAA: PHI in ungoverned backups, exports, or development databases is still PHI. “We didn’t mean to copy it there” is not a defense.

The real issue isn’t that these files exist; it’s that they proliferate silently. Developers and analysts create them to get work done, then forget they exist. Permissions change, buckets get opened, laptops get lost, and nobody ties it back to the forgotten dump or .db file.

Finding Every Copy Before Someone Else Does

Sentra continuously scans your cloud and storage environment - S3, Azure Blob, GCS, file shares, and more to discover and classify SQLite databases, SQL dumps, and related structured data files wherever they live.

For each one, we:

  • Identify the file type and contents
  • Extract tables and columns
  • Map sensitive data to your classification taxonomy

You get a live inventory of every known database copy, what’s inside, and where it sits. That’s the baseline you need to bring shadow databases back under governance. Because the question is not whether SQLite files and SQL dumps exist in your environment. The question is how many, where, and what’s inside them, and whether you want to find out before an attacker does.

<blogcta-big>

Read More
Daniel Suissa
Daniel Suissa
March 15, 2026
4
Min Read

The Blind Spot in Your Data Lake: Why Big Data Format Scanning Is the Next Frontier of Data Security

The Blind Spot in Your Data Lake: Why Big Data Format Scanning Is the Next Frontier of Data Security

Data lakes were supposed to be the great democratizer of enterprise analytics. Centralized, scalable, and cost-effective, they promised to put data in the hands of every team that needed it. And they delivered -- perhaps too well. Today, petabytes of sensitive data sit in Apache Parquet files, Avro containers, and ORC stores across S3 buckets, Azure Data Lake Storage, and Google Cloud Storage, often with little to no visibility into what those files actually contain.

Traditional Data Loss Prevention (DLP) tools were built for a world of emails, PDFs, and spreadsheets. They have no understanding of columnar storage formats, embedded schemas, or the sheer scale of modern data lake architectures. That gap is where sensitive data hides in plain sight -- and where Sentra's data lake format scanning changes the equation entirely.

The Shadow Data Problem in Data Lakes

Every modern enterprise runs some version of the same playbook: production databases feed into ETL pipelines, which land data in object storage as Parquet, Avro, or ORC files. Data engineers, analysts, and machine learning teams then consume that data downstream.

The security problem is straightforward but pervasive. When data engineering teams copy production data into data lakes for analytics, the PII that was supposed to be masked or anonymized often arrives intact. A full copy of customer records -- Social Security numbers, credit card numbers, health information -- ends up in a Parquet file in a shared S3 bucket, accessible to anyone with the right IAM role.

This is not a hypothetical scenario. It is the default state of most enterprise data lakes. And with data democratization initiatives actively expanding access to these stores, the blast radius of unprotected data lake files grows with every new user who gets read permissions.

Why Traditional DLP Falls Short

Conventional DLP solutions treat files as opaque blobs of text. They can scan a CSV or a Word document, but hand them an Apache Parquet file and they see nothing. This is a fundamental architectural limitation, not a feature gap that can be patched.

Big data formats are structurally different from traditional file types. Parquet and ORC use columnar storage, meaning data is organized by column rather than by row. Avro embeds its schema directly in the file. Arrow IPC (Feather) uses an in-memory format optimized for zero-copy reads. Scanning these formats requires purpose-built readers that understand their internal structure -- readers that traditional DLP simply does not have.

The result is a compliance blind spot that grows larger every quarter as more data moves into lakehouse architectures powered by Databricks, Snowflake external tables, and similar platforms.

How Sentra Scans Big Data Formats

Sentra provides native, schema-aware scanning for the full spectrum of data lake file formats. This is not a bolt-on capability -- it is core to how our platform understands modern data infrastructure.

Apache Parquet

Parquet is the lingua franca of the modern data lake. Sentra's tabular reader processes Parquet files with full awareness of their columnar structure, performing intelligent column-level classification. Rather than brute-forcing through every byte, Sentra leverages the columnar layout to efficiently scan individual columns for sensitive data patterns. Batch processing support means even large Parquet datasets are handled without requiring the entire file to be loaded into memory at once. Sentra also recognizes Spark checkpoint files (the `c000` convention) and processes them via Parquet or JSON fallback, ensuring that intermediate pipeline outputs do not escape scrutiny. Sentra also goes beyond the parquet schema and detects nested schemas like a json column that hides behind a “string” data type, adding meaningful context to the classification engine.

Apache Avro

Avro files carry their schema with them, and Sentra takes full advantage of that. Our tabular reader parses the embedded schema to understand field names, types, and structure before scanning the data itself. This schema-aware approach enables more accurate classification -- a field named `ssn` containing nine-digit numbers is treated differently than a field named `zip_code` with the same pattern.

Apache ORC

The Optimized Row Columnar format is a staple of Hive-based data warehouses and remains widely used across Hadoop-era data infrastructure. Sentra's tabular reader handles ORC files natively, applying the same column-level classification intelligence used for Parquet and Avro.

Apache Feather / Arrow IPC

Arrow's IPC format (commonly known as Feather) is increasingly used for fast data interchange between Python, R, and other analytics tools. Sentra scans these files through its textual reader, ensuring that even ephemeral interchange formats do not become a vector for untracked sensitive data.

Column-Level Intelligence

Across all of these formats, Sentra performs column-level scanning and classification. This is critical at data lake scale. A single column in a petabyte Parquet dataset could contain millions of Social Security numbers, while every other column holds benign operational metrics. Column-level granularity means Sentra can pinpoint exactly where sensitive data lives, rather than simply flagging an entire file as "contains PII."

The Compliance Imperative

Regulatory frameworks do not carve out exceptions for big data formats. GDPR's right of access and right to erasure apply regardless of whether personal data is stored in a PostgreSQL table or a Parquet file in S3. CCPA's disclosure requirements extend to every copy of consumer data, including the one sitting in your analytics data lake.

Data Subject Access Requests (DSARs) are particularly challenging when sensitive data is spread across thousands of Parquet files in a data lake. Without automated scanning that understands these formats, responding to a DSAR becomes a manual archaeology project -- expensive, slow, and error-prone.

The AI governance dimension adds another layer of urgency. Machine learning training datasets are frequently stored in Parquet format. If those datasets contain PII that was used to train models, organizations face regulatory exposure under emerging AI governance frameworks. Knowing what personal data exists in your ML training pipelines is no longer optional -- it is a compliance requirement that is rapidly taking shape across jurisdictions.

From Blind Spot to Full Visibility

The shift to data lakehouse architectures is accelerating. Databricks, Snowflake, and the broader modern data stack have made it easier than ever to store and process massive volumes of data in open file formats. That is a net positive for analytics and engineering teams. But without security tooling that speaks the same language as the data infrastructure, sensitive data will continue to accumulate in places where no one is looking.

Sentra closes that gap. By providing native, schema-aware scanning for Parquet, Avro, ORC, Feather, and related formats -- combined with intelligent column-level classification and efficient batch processing -- Sentra gives security and compliance teams the visibility they need into the fastest-growing data stores in the enterprise.

Data lakes are not going away. The question is whether your security posture can keep up with the data engineering teams that feed them. With Sentra, the answer is yes.

*Sentra is a Data Security Posture Management (DSPM) platform that automatically discovers, classifies, and monitors sensitive data across your entire cloud environment. To learn more about how Sentra handles data lake scanning and 150+ other file formats, book a demo with our data security experts.

<blogcta-big>

Read More
Nikki Ralston
Nikki Ralston
David Stuart
David Stuart
March 12, 2026
4
Min Read

How to Protect Sensitive Data in AWS

How to Protect Sensitive Data in AWS

Storing and processing sensitive data in the cloud introduces real risks, misconfigured buckets, over-permissive IAM roles, unencrypted databases, and logs that inadvertently capture PII. As cloud environments grow more complex in 2026, knowing how to protect sensitive data in AWS is a foundational requirement for any organization operating at scale. This guide breaks down the key AWS services, encryption strategies, and operational controls you need to build a layered defense around your most critical data assets.

How to Protect Sensitive Data in AWS (With Practical Examples)

Effective protection requires a layered, lifecycle-aware strategy. Here are the core controls to implement:

Field-Level and End-to-End Encryption

Rather than encrypting all data uniformly, use field-level encryption to target only sensitive fields, Social Security numbers, credit card details, while leaving non-sensitive data in plaintext. A practical approach: deploy Amazon CloudFront with a Lambda@Edge function that intercepts origin requests and encrypts designated JSON fields using RSA. AWS KMS manages the underlying keys, ensuring private keys stay secure and decryption is restricted to authorized services.

Encryption at Rest and in Transit

Enable default encryption on all storage assets, S3 buckets, EBS volumes, RDS databases. Use customer-managed keys (CMKs) in AWS KMS for granular control over key rotation and access policies. Enforce TLS across all service endpoints. Place databases in private subnets and restrict access through security groups, network ACLs, and VPC endpoints.

Strict IAM and Access Controls

Apply least privilege across all IAM roles. Use AWS IAM Access Analyzer to audit permissions and identify overly broad access. Where appropriate, integrate the AWS Encryption SDK with KMS for client-side encryption before data reaches any storage service.

Automated Compliance Enforcement

Use CloudFormation or Systems Manager to enforce encryption and access policies consistently. Centralize logging through CloudTrail and route findings to AWS Security Hub. This reduces the risk of shadow data and configuration drift that often leads to exposure.

What Is AWS Macie and How Does It Help Protect Sensitive Data?

AWS Macie is a managed security service that uses machine learning and pattern matching to discover, classify, and monitor sensitive data in Amazon S3. It continuously evaluates objects across your S3 inventory, detecting PII, financial data, PHI, and other regulated content without manual configuration per bucket.

Key capabilities:

  • Generates findings with sensitivity scores and contextual labels for risk-based prioritization
  • Integrates with AWS Security Hub and Amazon EventBridge for automated response workflows
  • Can trigger Lambda functions to restrict public access the moment sensitive data is detected
  • Provides continuous, auditable evidence of data discovery for GDPR, HIPAA, and PCI-DSS compliance

Understanding what sensitive data exposure looks like is the first step toward preventing it. Classifying data by sensitivity level lets you apply proportionate controls and limit blast radius if a breach occurs.

AWS Macie Pricing Breakdown

Macie offers a 30-day free trial covering up to 150 GB of automated discovery and bucket inventory. After that:

Component Cost
S3 bucket monitoring $0.10 per bucket/month (prorated daily), up to 10,000 buckets
Automated discovery $0.01 per 100,000 S3 objects/month + $1 per GB inspected beyond the first 1 GB
Targeted discovery jobs $1 per GB inspected; standard S3 GET/LIST request costs apply separately

For large environments, scope automated discovery to your highest-risk buckets first and use targeted jobs for periodic deep scans of lower-priority storage. This balances coverage with cost efficiency.

What Is AWS GuardDuty and How Does It Enhance Data Protection?

AWS GuardDuty is a managed threat detection service that continuously monitors CloudTrail events, VPC flow logs, and DNS logs. It uses machine learning, anomaly detection, and integrated threat intelligence to surface indicators of compromise.

What GuardDuty detects:

  • Unusual API calls and atypical S3 access patterns
  • Abnormal data exfiltration attempts
  • Compromised credentials
  • Multi-stage attack sequences correlated from isolated events

Findings and underlying log data are encrypted at rest using KMS and in transit via HTTPS. GuardDuty findings route to Security Hub or EventBridge for automated remediation, making it a key component of real-time data protection.

Using CloudWatch Data Protection Policies to Safeguard Sensitive Information

Applications frequently log more than intended, request payloads, error messages, and debug output can all contain sensitive data. CloudWatch Logs data protection policies automatically detect and mask sensitive information as log events are ingested, before storage.

How to Configure a Policy

  • Create a JSON-formatted data protection policy for a specific log group or at the account level
  • Specify data types to protect using over 100 managed data identifiers (SSNs, credit cards, emails, PHI)
  • The policy applies pattern matching and ML in real time to audit or mask detected data

Important Operational Considerations

  • Only users with the logs:Unmask IAM permission can view unmasked data
  • Encrypt log groups containing sensitive data using AWS KMS for an additional layer
  • Masking only applies to data ingested after a policy is active, existing log data remains unmasked
  • Set up alarms on the LogEventsWithFindings metric and route findings to S3 or Kinesis Data Firehose for audit trails

Implement data protection policies at the point of log group creation rather than retroactively, this is the single most common mistake teams make with CloudWatch masking.

How Sentra Extends AWS Data Protection with Full Visibility

Native AWS tools like Macie, GuardDuty, and CloudWatch provide strong point-in-time controls, but they don't give you a unified view of how sensitive data moves across accounts, services, and regions. This is where minimizing your data attack surface requires a purpose-built platform.

What Sentra adds:

  • Discovers and governs sensitive data at petabyte scale inside your own environment, data never leaves your control
  • Maps how sensitive data moves across AWS services and identifies shadow and redundant/obsolete/trivial (ROT) data
  • Enforces data-driven guardrails to prevent unauthorized AI access
  • Typically reduces cloud storage costs by ~20% by eliminating data sprawl

Knowing how to protect sensitive data in AWS means combining the right services, KMS for key management, Macie for S3 discovery, GuardDuty for threat detection, CloudWatch policies for log masking, with consistent access controls, encryption at every layer, and continuous monitoring. No single tool is sufficient. The organizations that get this right treat data protection as an ongoing operational discipline: audit IAM policies regularly, enforce encryption by default, classify data before it proliferates, and ensure your logging pipeline never exposes what it was meant to record.

<blogcta-big>

Read More
Expert Data Security Insights Straight to Your Inbox
What Should I Do Now:
1

Get the latest GigaOm DSPM Radar report - see why Sentra was named a Leader and Fast Mover in data security. Download now and stay ahead on securing sensitive data.

2

Sign up for a demo and learn how Sentra’s data security platform can uncover hidden risks, simplify compliance, and safeguard your sensitive data.

3

Follow us on LinkedIn, X (Twitter), and YouTube for actionable expert insights on how to strengthen your data security, build a successful DSPM program, and more!

Before you go...

Get the Gartner Customers' Choice for DSPM Report

Read why 98% of users recommend Sentra.

White Gartner Peer Insights Customers' Choice 2025 badge with laurel leaves inside a speech bubble.