All Resources
In this article:
minus iconplus icon
Share the Blog

Use Redshift Data Scrambling for Additional Data Protection

May 3, 2023
8
Min Read

According to IBM, a data breach in the United States cost companies an average of 9.44 million dollars in 2022. It is now more important than ever for organizations to place high importance on protecting confidential information. Data scrambling, which can add an extra layer of security to data, is one approach to accomplish this. 

In this post, we'll analyze the value of data protection, look at the potential financial consequences of data breaches, and talk about how Redshift Data Scrambling may help protect private information.

The Importance of Data Protection

Data protection is essential to safeguard sensitive data from unauthorized access. Identity theft, financial fraud,and other serious consequences are all possible as a result of a data breach. Data protection is also crucial for compliance reasons. Sensitive data must be protected by law in several sectors, including government, banking, and healthcare. Heavy fines, legal problems, and business loss may result from failure to abide by these regulations.

Hackers employ many techniques, including phishing, malware, insider threats, and hacking, to get access to confidential information. For example, a phishing assault may lead to the theft of login information, and malware may infect a system, opening the door for additional attacks and data theft. 

So how to protect yourself against these attacks and minimize your data attack surface?

What is Redshift Data Masking?

Redshift data masking is a technique used to protect sensitive data in Amazon Redshift; a cloud-based data warehousing and analytics service. Redshift data masking involves replacing sensitive data with fictitious, realistic values to protect it from unauthorized access or exposure. It is possible to enhance data security by utilizing Redshift data masking in conjunction with other security measures, such as access control and encryption, in order to create a comprehensive data protection plan.

What is Redshift Data Masking

What is Redshift Data Scrambling?

Redshift data scrambling protects confidential information in a Redshift database by altering original data values using algorithms or formulas, creating unrecognizable data sets. This method is beneficial when sharing sensitive data with third parties or using it for testing, development, or analysis, ensuring privacy and security while enhancing usability. 

The technique is highly customizable, allowing organizations to select the desired level of protection while maintaining data usability. Redshift data scrambling is cost-effective, requiring no additional hardware or software investments, providing an attractive, low-cost solution for organizations aiming to improve cloud data security.

Data Masking vs. Data Scrambling

Data masking involves replacing sensitive data with a fictitious but realistic value. However, data scrambling, on the other hand, involves changing the original data values using an algorithm or a formula to generate a new set of values.

In some cases, data scrambling can be used as part of data masking techniques. For instance, sensitive data such as credit card numbers can be scrambled before being masked to enhance data protection further.

Setting up Redshift Data Scrambling

Having gained an understanding of Redshift and data scrambling, we can now proceed to learn how to set it up for implementation. Enabling data scrambling in Redshift requires several steps.

To achieve data scrambling in Redshift, SQL queries are utilized to invoke built-in or user-defined functions. These functions utilize a blend of cryptographic techniques and randomization to scramble the data.

The following steps are explained using an example code just for a better understanding of how to set it up:

Step 1: Create a new Redshift cluster

Create a new Redshift cluster or use an existing cluster if available. 

Redshift create cluster

Step 2: Define a scrambling key

Define a scrambling key that will be used to scramble the sensitive data.

 
SET session my_scrambling_key = 'MyScramblingKey';

In this code snippet, we are defining a scrambling key by setting a session-level parameter named <inlineCode>my_scrambling_key<inlineCode> to the value <inlineCode>MyScramblingKey<inlineCode>. This key will be used by the user-defined function to scramble the sensitive data.

Step 3: Create a user-defined function (UDF)

Create a user-defined function in Redshift that will be used to scramble the sensitive data. 


CREATE FUNCTION scramble(input_string VARCHAR)
RETURNS VARCHAR
STABLE
AS $$
DECLARE
scramble_key VARCHAR := 'MyScramblingKey';
BEGIN
-- Scramble the input string using the key
-- and return the scrambled output
RETURN ;
END;
$$ LANGUAGE plpgsql;

Here, we are creating a UDF named <inlineCode>scramble<inlineCode> that takes a string input and returns the scrambled output. The function is defined as <inlineCode>STABLE<inlineCode>, which means that it will always return the same result for the same input, which is important for data scrambling. You will need to input your own scrambling logic.

Step 4: Apply the UDF to sensitive columns

Apply the UDF to the sensitive columns in the database that need to be scrambled.


UPDATE employee SET ssn = scramble(ssn);

For example, applying the <inlineCode>scramble<inlineCode> UDF to a column saying, <inlineCode>ssn<inlineCode> in a table named <inlineCode>employee<inlineCode>. The <inlineCode>UPDATE<inlineCode> statement calls the <inlineCode>scramble<inlineCode> UDF and updates the values in the <inlineCode>ssn<inlineCode> column with the scrambled values.

Step 5: Test and validate the scrambled data

Test and validate the scrambled data to ensure that it is unreadable and unusable by unauthorized parties.


SELECT ssn, scramble(ssn) AS scrambled_ssn
FROM employee;

In this snippet, we are running a <inlineCode>SELECT<inlineCode> statement to retrieve the <inlineCode>ssn<inlineCode> column and the corresponding scrambled value using the <inlineCode>scramble<inlineCode> UDF. We can compare the original and scrambled values to ensure that the scrambling is working as expected. 

Step 6: Monitor and maintain the scrambled data

To monitor and maintain the scrambled data, we can regularly check the sensitive columns to ensure that they are still rearranged and that there are no vulnerabilities or breaches. We should also maintain the scrambling key and UDF to ensure that they are up-to-date and effective.

Different Options for Scrambling Data in Redshift

Selecting a data scrambling technique involves balancing security levels, data sensitivity, and application requirements. Various general algorithms exist, each with unique pros and cons. To scramble data in Amazon Redshift, you can use the following Python code samples in conjunction with a library like psycopg2 to interact with your Redshift cluster. Before executing the code samples, you will need to install the psycopg2 library:


pip install psycopg2

Random

Utilizing a random number generator, the Random option quickly secures data, although its susceptibility to reverse engineering limits its robustness for long-term protection.


import random
import string
import psycopg2

def random_scramble(data):
    scrambled = ""
    for char in data:
        scrambled += random.choice(string.ascii_letters + string.digits)
    return scrambled

# Connect to your Redshift cluster
conn = psycopg2.connect(host='your_host', port='your_port', dbname='your_dbname', user='your_user', password='your_password')
cursor = conn.cursor()
# Fetch data from your table
cursor.execute("SELECT sensitive_column FROM your_table;")
rows = cursor.fetchall()

# Scramble the data
scrambled_rows = [(random_scramble(row[0]),) for row in rows]

# Update the data in the table
cursor.executemany("UPDATE your_table SET sensitive_column = %s WHERE sensitive_column = %s;", [(scrambled, original) for scrambled, original in zip(scrambled_rows, rows)])
conn.commit()

# Close the connection
cursor.close()
conn.close()

Shuffle

The Shuffle option enhances security by rearranging data characters. However, it remains prone to brute-force attacks, despite being harder to reverse-engineer.


import random
import psycopg2

def shuffle_scramble(data):
    data_list = list(data)
    random.shuffle(data_list)
    return ''.join(data_list)

conn = psycopg2.connect(host='your_host', port='your_port', dbname='your_dbname', user='your_user', password='your_password')
cursor = conn.cursor()

cursor.execute("SELECT sensitive_column FROM your_table;")
rows = cursor.fetchall()

scrambled_rows = [(shuffle_scramble(row[0]),) for row in rows]

cursor.executemany("UPDATE your_table SET sensitive_column = %s WHERE sensitive_column = %s;", [(scrambled, original) for scrambled, original in zip(scrambled_rows, rows)])
conn.commit()

cursor.close()
conn.close()

Reversible

By scrambling characters in a decryption key-reversible manner, the Reversible method poses a greater challenge to attackers but is still vulnerable to brute-force attacks. We’ll use the Caesar cipher as an example.


def caesar_cipher(data, key):
    encrypted = ""
    for char in data:
        if char.isalpha():
            shift = key % 26
            if char.islower():
                encrypted += chr((ord(char) - 97 + shift) % 26 + 97)
            else:
                encrypted += chr((ord(char) - 65 + shift) % 26 + 65)
        else:
            encrypted += char
    return encrypted

conn = psycopg2.connect(host='your_host', port='your_port', dbname='your_dbname', user='your_user', password='your_password')
cursor = conn.cursor()

cursor.execute("SELECT sensitive_column FROM your_table;")
rows = cursor.fetchall()

key = 5
encrypted_rows = [(caesar_cipher(row[0], key),) for row in rows]
cursor.executemany("UPDATE your_table SET sensitive_column = %s WHERE sensitive_column = %s;", [(encrypted, original) for encrypted, original in zip(encrypted_rows, rows)])
conn.commit()

cursor.close()
conn.close()

Custom

The Custom option enables users to create tailor-made algorithms to resist specific attack types, potentially offering superior security. However, the development and implementation of custom algorithms demand greater time and expertise.

Best Practices for Using Redshift Data Scrambling

There are several best practices that should be followed when using Redshift Data Scrambling to ensure maximum protection:

Use Unique Keys for Each Table

To ensure that the data is not compromised if one key is compromised, each table should have its own unique key pair. This can be achieved by creating a unique index on the table.


CREATE UNIQUE INDEX idx_unique_key ON table_name (column_name);

Encrypt Sensitive Data Fields 

Sensitive data fields such as credit card numbers and social security numbers should be encrypted to provide an additional layer of security. You can encrypt data fields in Redshift using the ENCRYPT function. Here's an example of how to encrypt a credit card number field:


SELECT ENCRYPT('1234-5678-9012-3456', 'your_encryption_key_here');

Use Strong Encryption Algorithms

Strong encryption algorithms such as AES-256 should be used to provide the strongest protection. Redshift supports AES-256 encryption for data at rest and in transit.


CREATE TABLE encrypted_table (  sensitive_data VARCHAR(255) ENCODE ZSTD ENCRYPT 'aes256' KEY 'my_key');

Control Access to Encryption Keys 

Access to encryption keys should be restricted to authorized personnel to prevent unauthorized access to sensitive data. You can achieve this by setting up an AWS KMS (Key Management Service) to manage your encryption keys. Here's an example of how to restrict access to an encryption key using KMS in Python:


import boto3

kms = boto3.client('kms')

key_id = 'your_key_id_here'
grantee_principal = 'arn:aws:iam::123456789012:user/jane'

response = kms.create_grant(
    KeyId=key_id,
    GranteePrincipal=grantee_principal,
    Operations=['Decrypt']
)

print(response)

Regularly Rotate Encryption Keys 

Regular rotation of encryption keys ensures that any compromised keys do not provide unauthorized access to sensitive data. You can schedule regular key rotation in AWS KMS by setting a key policy that specifies a rotation schedule. Here's an example of how to schedule annual key rotation in KMS using the AWS CLI:

 
aws kms put-key-policy \\
    --key-id your_key_id_here \\
    --policy-name default \\
    --policy
    "{\\"Version\\":\\"2012-10-17\\",\\"Statement\\":[{\\"Effect\\":\\"Allow\\"
    "{\\"Version\\":\\"2012-10-17\\",\\"Statement\\":[{\\"Effect\\":\\"Allow\\"
    \\":\\"kms:RotateKey\\",\\"Resource\\":\\"*\\"},{\\"Effect\\":\\"Allow\\",\
    \"Principal\\":{\\"AWS\\":\\"arn:aws:iam::123456789012:root\\"},\\"Action\\
    ":\\"kms:CreateGrant\\",\\"Resource\\":\\"*\\",\\"Condition\\":{\\"Bool\\":
    {\\"kms:GrantIsForAWSResource\\":\\"true\\"}}}]}"

Turn on logging 

To track user access to sensitive data and identify any unwanted access, logging must be enabled. All SQL commands that are executed on your cluster are logged when you activate query logging in Amazon Redshift. This applies to queries that access sensitive data as well as data-scrambling operations. Afterwards, you may examine these logs to look for any strange access patterns or suspect activities.

You may use the following SQL statement to make query logging available in Amazon Redshift:

ALTER DATABASE  SET enable_user_activity_logging=true;

The stl query system table may be used to retrieve the logs once query logging has been enabled. For instance, the SQL query shown below will display all queries that reached a certain table:

Monitor Performance 

Data scrambling is often a resource-intensive practice, so it’s good to monitor CPU usage, memory usage, and disk I/O to ensure your cluster isn’t being overloaded. In Redshift, you can use the <inlineCode>svl_query_summary<inlineCode> and <inlineCode>svl_query_report<inlineCode> system views to monitor query performance. You can also use Amazon CloudWatch to monitor metrics such as CPU usage and disk space.

Amazon CloudWatch

Establishing Backup and Disaster Recovery

In order to prevent data loss in the case of a disaster, backup and disaster recovery mechanisms should be put in place. Automated backups and manual snapshots are only two of the backup and recovery methods offered by Amazon Redshift. Automatic backups are taken once every eight hours by default. 

Moreover, you may always manually take a snapshot of your cluster. In the case of a breakdown or disaster, your cluster may be restored using these backups and snapshots. Use this SQL query to manually take a snapshot of your cluster in Amazon Redshift:

CREATE SNAPSHOT ; 

To restore a snapshot, you can use the <inlineCode>RESTORE<inlineCode> command. For example:


RESTORE 'snapshot_name' TO 'new_cluster_name';

Frequent Review and Updates

To ensure that data scrambling procedures remain effective and up-to-date with the latest security requirements, it is crucial to consistently review and update them. This process should include examining backup and recovery procedures, encryption techniques, and access controls.

In Amazon Redshift, you can assess access controls by inspecting all roles and their associated permissions in the <inlineCode>pg_roles<inlineCode> system catalog database. It is essential to confirm that only authorized individuals have access to sensitive information.

To analyze encryption techniques, use the <inlineCode>pg_catalog.pg_attribute<inlineCode> system catalog table, which allows you to inspect data types and encryption settings for each column in your tables. Ensure that sensitive data fields are protected with robust encryption methods, such as AES-256.

The AWS CLI commands <inlineCode>aws backup plan<inlineCode> and <inlineCode>aws backup vault<inlineCode> enable you to review your backup plans and vaults, as well as evaluate backup and recovery procedures. Make sure your backup and recovery procedures are properly configured and up-to-date.

Decrypting Data in Redshift

There are different options for decrypting data, depending on the encryption method used and the tools available; the decryption process is similar to of encryption, usually a custom UDF is used to decrypt the data, let’s look at one example of decrypting data scrambling with a substitution cipher.

Step 1: Create a UDF with decryption logic for substitution


CREATE FUNCTION decrypt_substitution(ciphertext varchar) RETURNS varchar
IMMUTABLE AS $$
    alphabet = 'abcdefghijklmnopqrstuvwxyz'
    substitution = 'ijklmnopqrstuvwxyzabcdefgh'
    reverse_substitution = ''.join(sorted(substitution, key=lambda c: substitution.index(c)))
    plaintext = ''
    for i in range(len(ciphertext)):
        index = substitution.find(ciphertext[i])
        if index == -1:
            plaintext += ciphertext[i]
        else:
            plaintext += reverse_substitution[index]
    return plaintext
$$ LANGUAGE plpythonu;

Step 2: Move the data back after truncating and applying the decryption function


TRUNCATE original_table;
INSERT INTO original_table (column1, decrypted_column2, column3)
SELECT column1, decrypt_substitution(encrypted_column2), column3
FROM temp_table;

In this example, encrypted_column2 is the encrypted version of column2 in the temp_table. The decrypt_substitution function is applied to encrypted_column2, and the result is inserted into the decrypted_column2 in the original_table. Make sure to replace column1, column2, and column3 with the appropriate column names, and adjust the INSERT INTO statement accordingly if you have more or fewer columns in your table.

Conclusion

Redshift data scrambling is an effective tool for additional data protection and should be considered as part of an organization's overall data security strategy. In this blog post, we looked into the importance of data protection and how this can be integrated effectively into the  data warehouse. Then, we covered the difference between data scrambling and data masking before diving into how one can set up Redshift data scrambling.

Once you begin to accustom to Redshift data scrambling, you can upgrade your security techniques with different techniques for scrambling data and best practices including encryption practices, logging, and performance monitoring. Organizations may improve their data security posture management (DSPM) and reduce the risk of possible breaches by adhering to these recommendations and using an efficient strategy.

<blogcta-big>

Veronica is the security researcher at Sentra. She brings a wealth of knowledge and experience as a cybersecurity researcher. Her main focuses are researching the main cloud provider services and AI infrastructures for Data related threats and techniques.

Subscribe

Latest Blog Posts

Nikki Ralston
Nikki Ralston
February 25, 2026
3
Min Read

SOC 2 Without the Spreadsheet Chaos: Automating Evidence for Regulated Data Controls

SOC 2 Without the Spreadsheet Chaos: Automating Evidence for Regulated Data Controls

SOC 2 has become table stakes for cloud‑native and SaaS organizations. But for many security and GRC teams, each SOC 2 cycle still feels like starting from scratch; hunting for the latest access reviews, exporting encryption settings from multiple consoles, proving backups and logs exist - per data set, per environment. If your SOC 2 evidence process is still a patchwork of spreadsheets and screenshots, you’re not alone. The missing piece is a data‑centric view of your controls, especially around regulated data.

Why SOC 2 Evidence Is So Hard in Cloud and SaaS Environments

Under SOC 2, trust service criteria like Security, Availability, and Confidentiality translate into specific expectations around data:

Is sensitive or regulated data discovered and classified consistently?

Are core controls (encryption, backup, access, logging) actually in place where that data lives?

Can you show continuous monitoring instead of point‑in‑time screenshots?

In a typical multi‑cloud/SaaS environment:

  • Sensitive data is scattered across S3, databases, Snowflake, M365/Google Workspace, Salesforce, and more.
  • Different teams own pieces of the puzzle (infra, security, data, app owners).
  • Legacy tools are siloed by layer (CSPM for infra, DLP for traffic, privacy catalog for RoPA).

So when SOC 2 comes around, you spend weeks assembling a story instead of being able to show a trusted, provable compliance posture at the data layer.

The Data‑First Approach to SOC 2 Evidence

Instead of treating SOC 2 as a separate project, leading teams are aligning it with their data security posture management (DSPM) strategy:

  1. Start from the data, not from the infrastructure
  • Build a unified inventory of sensitive and regulated data across IaaS, PaaS, SaaS, and on‑prem.
  • Enrich each store with sensitivity, residency, and business context.

  1. Attach control posture to each data store
  • For each regulated data store, track encryption status, backup configuration, access model, and logging/monitoring coverage as posture attributes.

  1. Generate SOC‑aligned evidence from the same system
  • Use the regulated‑data inventory plus posture engine to produce SOC 2‑friendly reports and CSVs, rather than collecting evidence manually for each audit cycle.

This is exactly the pattern that modern data security platforms like Sentra are implementing.

How Sentra Helps Security and GRC Teams Automate SOC 2 Evidence

Sentra sits across your data estate and focuses on regulated data, with capabilities that map directly onto SOC 2 evidence needs:

Comprehensive data‑store discovery and classification
Agentless discovery of data stores (managed and unmanaged) across multi‑cloud and on‑prem, combined with high‑accuracy classification for regulated and business‑critical data.

Data‑centric security posture
For each store, Sentra tracks security properties—including encryption, backup, logging, and access configuration, and surfaces gaps where sensitive data is insufficiently protected.

Framework‑aligned reporting
SOC 2 and other frameworks can be represented as report templates that pull directly from Sentra’s inventory and posture attributes, giving GRC teams “audit‑ready” exports without rebuilding evidence from scratch.

The result is you can prove control over regulated data, for SOC 2 and beyond, with far less manual overhead.

Mapping SOC 2 Criteria to Data‑Level Evidence

Here’s how a data‑first posture shows up in SOC 2:

CC6.x (Logical and Physical Access Controls)

Evidence: Identity‑to‑data mapping showing which users/roles can access which sensitive datasets across cloud and SaaS.

CC7.x (Change Management / Monitoring)

Evidence: Data Detection & Response (DDR) signals and anomaly analytics around access to crown‑jewel data; logs that tie back to sensitive data stores.

CC8.x (Risk Mitigation)

Evidence: Risk‑prioritized view of data stores based on sensitivity and missing controls, plus remediation workflows or automatic labeling/tagging to tighten upstream policies.

When this data‑level view is in place, SOC 2 becomes evidence selection rather than evidence construction.

A Repeatable SOC 2 Playbook for Security, GRC, and Privacy

To operationalize this approach, many teams follow a recurring pattern:

  1. Define a “regulated data perimeter” for SOC 2: Identify which clouds, SaaS platforms, and on‑prem stores contain in‑scope data (PII, PHI, PCI, financial records).

  1. Instrument with DSPM: Deploy a data security platform like Sentra to discover, classify, and map access to that data perimeter.

  1. Connect GRC to the same source of truth: Have GRC and privacy teams pull their SOC 2 evidence from the same inventory and posture views Security uses for day‑to‑day risk management.

  1. Continuously refine controls: Use posture and DDR insights to reduce exposure, close misconfigurations, and improve your next SOC 2 cycle before it starts.

The more you lean on a shared, data‑centric foundation, the easier it becomes to maintain a trusted, provable compliance posture across frameworks, not just SOC 2.

Turning SOC 2 From a Project Into a Capability

Ultimately, the goal is to stop treating SOC 2 as a once-a-year project and start treating it as an ongoing capability embedded into how your organization operates. Security, GRC, and privacy teams should work from a single, unified view of regulated data and controls. Evidence should always be a few clicks away - not the result of a month-long scramble. And every audit should strengthen your data security posture, not distract from it. If you’re still managing compliance in spreadsheets, it’s worth asking what it would take to make your SOC 2 posture something you can prove on demand.

Ready to end the fire drills and move to continuous compliance? Book a Demo 

<blogcta-big>

Read More
Adi Voulichman
Adi Voulichman
February 23, 2026
4
Min Read

How to Discover Sensitive Data in the Cloud

How to Discover Sensitive Data in the Cloud

As cloud environments grow more complex in 2026, knowing how to discover sensitive data in the cloud has become one of the most pressing challenges for security and compliance teams. Data sprawls across IaaS, PaaS, SaaS platforms, and on-premise file shares, often duplicating, moving between environments, and landing in places no one intended. Without a systematic approach to discovery, organizations risk regulatory exposure, unauthorized AI access, and costly breaches. This article breaks down the key methods, tools, and architectural considerations that make cloud sensitive data discovery both effective and scalable.

Why Sensitive Data Discovery in the Cloud Is So Difficult

The core problem is visibility. Sensitive data, PII, financial records, health information, intellectual property, doesn't stay in one place. It gets copied from production to development environments, ingested into AI pipelines, backed up across regions, and shared through SaaS applications. Each transition creates a new exposure surface.

  • Toxic combinations: High-sensitivity data behind overly permissive access configurations creates dangerous scenarios that require continuous, context-aware monitoring, not just point-in-time scans.
  • Shadow and ROT data: Redundant, obsolete, or trivial data inflates cloud storage costs and expands the attack surface without adding business value.
  • Multi-environment sprawl: Data moves across cloud providers, regions, and service tiers, making a single unified view extremely difficult to maintain.

What Are Cloud DLP Solutions and How Do They Work?

Cloud Data Loss Prevention (DLP) solutions discover, classify, and protect sensitive information across cloud storage, applications, and databases. They operate through several interconnected mechanisms:

  • Scan and classify: Pattern matching, machine learning, and custom detectors identify sensitive content and assign classification labels (e.g., public, confidential, restricted).
  • Enforce automated policies: Context-aware rules trigger encryption, masking, or access restrictions based on classification results.
  • Monitor data movement: Continuous tracking of transfers and user behaviors detects anomalies like unusual download patterns or overly broad sharing.
  • Integrate with broader controls: Many DLP tools work alongside CASBs and Zero Trust frameworks for end-to-end protection.

The result is enhanced visibility into where sensitive data lives and a proactive enforcement layer that reduces breach risk while supporting regulatory compliance.

What Is Google Cloud Sensitive Data Protection?

Google Cloud Sensitive Data Protection is a cloud-native service that automatically discovers, classifies, and protects sensitive information across Cloud Storage buckets, BigQuery tables, and other Google Cloud data assets.

Core Capabilities

  • Automated discovery and profiling: Scans projects, folders, or entire organizations to generate data profiles summarizing sensitivity levels and risk indicators, enabling continuous monitoring at scale.
  • Detailed data inspection: Performs granular analysis using hundreds of built-in detectors alongside custom infoTypes defined through dictionaries, regular expressions, or contextual rules.
  • De-identification techniques: Supports redaction, masking, and tokenization, making it a strong foundation for data governance within the Google Cloud ecosystem.

How Sensitive Data Protection’s Data Profiler Finds Sensitive Information

Sensitive Data Protection’s data profiler automates scanning across BigQuery, Cloud SQL, Cloud Storage, Vertex AI datasets, and even external sources like Amazon S3 or Azure Blob Storage (for eligible Security Command Center customers). The process starts with a scan configuration defining scope and an inspection template specifying which sensitive data types to detect.

Profile Dimension Details
Granularity levels Project, table, column (structured); bucket or container (file stores)
Statistical insights Null value percentages, data distributions, predicted infoTypes, sensitivity and risk scores
Scan frequency On a schedule you define and automatically when data is added or modified
Integrations Security Command Center, Dataplex Universal Catalog for IAM refinement and data quality enforcement

These profiles give security and governance teams an always-current view of where sensitive data resides and how risky each asset is.

Understanding Sensitive Data Protection Pricing

Sensitive Data Protection primarily uses per-GB profiling charges, billed based on the amount of input data scanned, with minimums and caps per dataset or table. Certain tiers of Security Command Center include organization-level discovery as part of the subscription, but for most workloads several factors directly influence total cost:

Cost Factor Impact Optimization Strategy
Data volume Larger datasets and full scans cost more Scope discovery to high-risk data stores first
Scan frequency Recurring scans accumulate costs quickly Scan only new or modified data
Scan complexity Multiple or custom detectors require more processing Filter irrelevant file types before scanning
Integration overhead Compute, network egress, and encryption keys add cost Minimize cross-region data movement during scans

For organizations operating at petabyte scale, these factors make it essential to design discovery workflows carefully rather than running broad, undifferentiated scans.

Tracking Data Movement Beyond Static Location

Static discovery, knowing where sensitive data sits right now, is necessary but insufficient. The real risk often emerges when data moves: from production to development, across regions, into AI training pipelines, or through ETL processes.

  • Data lineage tracking: Captures transitions in real time, not just periodic snapshots.
  • Boundary crossing detection: Flags when sensitive assets cross environment boundaries or land in unexpected locations.
  • Practical example: Detecting when PII flows from a production database into a dev environment is a critical control, and requires active movement monitoring.

This is where platforms differ significantly. Some tools focus on cataloging data at rest, while more advanced solutions continuously monitor flows and surface risks as they emerge.

How Sentra Approaches Sensitive Data Discovery at Scale

Sentra is built specifically for the challenges described throughout this article. Its agentless architecture connects directly to cloud provider APIs without inline components on your data path and operates entirely in-environment, so sensitive data never leaves your control for processing. This design is critical for organizations with strict data residency requirements or preparing for regulatory audits.

Key Capabilities

  • Unified multi-environment coverage: Spans IaaS, PaaS, SaaS, and on-premise file shares with AI-powered classification that distinguishes real sensitive data from mock or test data.
  • DataTreks™ mapping: Creates an interactive map of the entire data estate, tracking active data movement including ETL processes, migrations, backups, and AI pipeline flows.
  • Toxic combination detection: Surfaces sensitive data behind overly broad access controls with remediation guidance.
  • Microsoft Purview integration: Supports automated sensitivity labeling across environments, feeding high-accuracy labels into Purview DLP and broader Microsoft 365 controls.

What Users Say (Early 2026)

Strengths:

  • Classification accuracy: Reviewers note it is “fast and most accurate” compared to alternatives.
  • Shadow data discovery: “Brought visibility to unstructured data like chat messages, images, and call transcripts” that other tools missed.
  • Compliance facilitation: Teams report audit preparation has become significantly more manageable.

Considerations:

  • Initial learning curve with the dashboard configuration.
  • On-premises capabilities are less mature than cloud coverage, relevant for organizations with significant legacy infrastructure.

Beyond security, Sentra's elimination of shadow and ROT data typically reduces cloud storage costs by approximately 20%, extending the business case well beyond compliance.

For teams looking to understand how to discover sensitive data in the cloud at enterprise scale, Sentra's Data Discovery and Classification offers a comprehensive starting point, and its in-environment architecture ensures the discovery process itself doesn't introduce new risk.

<blogcta-big>

Read More
Yair Cohen
Yair Cohen
Jonathan Kreiner
Jonathan Kreiner
February 20, 2026
4
Min Read

Thinking Beyond Policies: AI‑Ready Data Protection

Thinking Beyond Policies: AI‑Ready Data Protection

AI assistants, SaaS, and hybrid work have made data easier than ever to discover, share, and reuse. Tools like Gemini for Google Workspace and Microsoft 365 Copilot can search across drives, mailboxes, chats, and documents in seconds - surfacing information that used to be buried in obscure folders and old snapshots.

That’s great for productivity, but dangerous for data security.

Traditional, policy‑based DLP wasn’t designed to handle this level of complexity. At the same time, many organizations now use DSPM tools to understand where their sensitive data lives, but still lack real‑time control over how that data moves on endpoints, in browsers, and across SaaS.

Together, Sentra and Orion close this gap: Sentra brings next‑gen, context-driven DSPM; Orion brings next‑gen, behavior‑driven DLP. The result is end‑to‑end, AI‑ready data protection from data store to last‑mile usage, creating a learning, self‑improving posture rather than a static set of controls.

Why DSPM or DLP Alone Isn’t Enough

Modern data environments require two distinct capabilities: deep data intelligence and real-time enforcement based on contextual business context.

DSPM solutions provide a data-centric view of risk. They continuously discover and classify sensitive data across cloud, SaaS, and on-prem environments. They map exposure, detect shadow data, and surface over-permissioned access. This gives security teams a clear understanding of what sensitive data exists, where it resides, who can access it, and how exposed it is.

DLP solutions operate where data moves - on endpoints, in browsers, across SaaS, and in email. They enforce policies and prevent exfiltration as it happens. 

Without rich data context like accurate sensitivity classification, exposure mapping, and identity-to-data relationships, DLP solutions often rely on predefined rules or limited signals to decide what to block, allow, or escalate.

DLP can be enforced, but its precision depends on the quality of the data intelligence behind it.

In AI-enabled, multi-cloud environments, visibility without enforcement is insufficient - and enforcement without deep data understanding lacks precision. To protect sensitive data from discovery by AI assistants, misuse across SaaS, or exfiltration from endpoints, organizations need accurate, continuously updated data intelligence, real-time, context-aware enforcement, and feedback between the two layers. 

That is where Sentra and Orion complement each other.

Sentra: Data‑Centric Intelligence for AI and SaaS

Sentra provides the data foundation: a continuous, accurate understanding of what you’re protecting and how exposed it is.

Deep Discovery and Classification

Sentra continuously discovers and classifies sensitive data across cloud‑native platforms, SaaS, and on‑prem data stores, including Google Workspace, Microsoft 365, databases, and object storage. Under the hood, Sentra uses AI/ML, OCR, and transcription to analyze both structured and unstructured data, and leverages rich data class libraries to identify PII, PHI, PCI, IP, credentials, HR data, legal content, and more, with configurable sensitivity levels.

This creates a live, contextual map of sensitive data: what it is, where it resides, and how important it is.

Reducing Shadow Data and Exposure

Sentra helps teams clean up the environment before AI and users can misuse it. 

It uncovers shadow data and obsolete assets that still carry sensitive content, highlights redundant or orphaned data that increases exposure (without adding business value), and supports collaborative workflows for remediation for security, data, and app owners.

Access Governance and Labeling for AI and DLP

Sentra turns visibility into governance signals. It maps which identities have access to which sensitive data classes and data stores, exposing overpermissioning and risky external access, and driving least‑privilege by aligning access rights with sensitivity and business needs.

To achieve this, Sentra automatically applies and enforces:

Google Labels across Google Drive, powering Gemini controls and DLP for Drive, and Microsoft Purview Information Protection (MPIP) labels across Microsoft 365, powering Copilot and DLP policies.

These labels become the policy fabric downstream AI and DLP engines use to decide what can be searched, summarized, or shared.

Orion: Behavior‑Driven DLP That Thinks Beyond Policies

Orion replaces policy reliance with a set of intelligent, context-aware proprietary AI agents

AI Agents That Understand Context

Orion’s agents collect rich context about data, identity, environment, and business relationships

This includes mapping data lineage and movement patterns from source to destination, a contextual understanding of identities (role, department, tenure, and more), environmental context (geography, network zone, working hours), external business relationships (vendor/customer status), Sentra’s data classification, and more. 

Based on this rich, business-aware context, Orion’s agents detect indicators of data loss and stop potential exfiltrations before they become incidents. That means a full alignment between DLP and how your business actually operates, rather than how it was imagined in static policies.

Unified Coverage Where Data Moves

Orion is designed as a unified DLP solution, covering: 

  • Endpoints
  • SaaS applications
  • Web and cloud
  • Email
  • On‑prem and storage, including channels like print

From initial deployment, Orion quickly provides meaningful detections grounded in real behavior, not just pattern hits. Security teams then get trusted, high‑quality alerts.

Better Together: End‑to‑End, AI‑Ready Protection

Individually, Sentra and Orion address critical yet distinct challenges. Together, they create a closed loop:

Sentra → Orion: Smarter Detections

Sentra gives Orion high‑quality context:

  • Which assets are truly sensitive, and at what level.
  • Where they live, how widely they’re exposed, and which identities can reach them.
  • Which documents and stores carry labels or policies that demand stricter treatment.

Orion uses this information to prioritize and enrich detections, focusing on events involving genuinely high‑risk data. It can then adapt behavior models to each user and data class, improving precision over time.

Orion → Sentra: Real‑World Feedback

Orion’s view into actual data movement feeds back into Sentra, exposing data stores that repeatedly appear in risky behaviors and serve as prime candidates for cleanup or stricter access governance. It also highlights identities whose actions don’t align with their expected access profile, feeding Sentra’s least‑privilege workflows. This turns data protection into a self‑improving system instead of a set of static controls.

What this means for Security and Risk Teams

With Sentra and Orion together, organizations can:

  • Securely adopt AI assistants like Gemini and Copilot, with Sentra controlling what they can see and Orion controlling how data is actually used on endpoints and SaaS.
  • Eliminate shadow data as an exfil path by first mapping and reducing it with Sentra, then guarding remaining high‑risk assets with Orion until they’re remediated.
  • Make least‑privilege real, with Sentra defining who should have access to what and Orion enforcing that principle in everyday behavior.
  • Provide auditors and boards with evidence that sensitive data is discovered, governed, and protected from exfiltration across both data platforms and endpoints.

Instead of choosing between “see everything but act slowly” (DSPM‑only) and “act without deep context” (DLP‑only), Sentra and Orion let you do both well - with one data‑centric brain and one behavior‑aware nervous system.

Ready to See Sentra + Orion in Action?

If you’re looking to secure AI adoption, reduce data loss risk, and retire legacy DLP noise, the combination of Sentra DSPM and Orion DLP offers a practical, modern path forward.

See how a unified, AI‑ready data protection architecture can look in your environment by mapping your most critical data and exposures with Sentra, and letting Orion protect that data as it moves across endpoints, SaaS, and web in real time.

Request a joint demo to explore how Sentra and Orion together can help you think beyond policies and build a data protection program designed for the AI era.

<blogcta-big>

Read More
Expert Data Security Insights Straight to Your Inbox
What Should I Do Now:
1

Get the latest GigaOm DSPM Radar report - see why Sentra was named a Leader and Fast Mover in data security. Download now and stay ahead on securing sensitive data.

2

Sign up for a demo and learn how Sentra’s data security platform can uncover hidden risks, simplify compliance, and safeguard your sensitive data.

3

Follow us on LinkedIn, X (Twitter), and YouTube for actionable expert insights on how to strengthen your data security, build a successful DSPM program, and more!

RSA 2026 Conference Logo
Going to RSA?

Meet with CISOs from Nestlé, SoFi, and PennyMac

Hear how they are making data AI ready

Join our exclusive RSA Roundtable 

Register Now