All Resources
In this article:
minus iconplus icon
Share the Blog

Use Redshift Data Scrambling for Additional Data Protection

May 3, 2023
8
Min Read

According to IBM, a data breach in the United States cost companies an average of 9.44 million dollars in 2022. It is now more important than ever for organizations to place high importance on protecting confidential information. Data scrambling, which can add an extra layer of security to data, is one approach to accomplish this. 

In this post, we'll analyze the value of data protection, look at the potential financial consequences of data breaches, and talk about how Redshift Data Scrambling may help protect private information.

The Importance of Data Protection

Data protection is essential to safeguard sensitive data from unauthorized access. Identity theft, financial fraud,and other serious consequences are all possible as a result of a data breach. Data protection is also crucial for compliance reasons. Sensitive data must be protected by law in several sectors, including government, banking, and healthcare. Heavy fines, legal problems, and business loss may result from failure to abide by these regulations.

Hackers employ many techniques, including phishing, malware, insider threats, and hacking, to get access to confidential information. For example, a phishing assault may lead to the theft of login information, and malware may infect a system, opening the door for additional attacks and data theft. 

So how to protect yourself against these attacks and minimize your data attack surface?

What is Redshift Data Masking?

Redshift data masking is a technique used to protect sensitive data in Amazon Redshift; a cloud-based data warehousing and analytics service. Redshift data masking involves replacing sensitive data with fictitious, realistic values to protect it from unauthorized access or exposure. It is possible to enhance data security by utilizing Redshift data masking in conjunction with other security measures, such as access control and encryption, in order to create a comprehensive data protection plan.

What is Redshift Data Masking

What is Redshift Data Scrambling?

Redshift data scrambling protects confidential information in a Redshift database by altering original data values using algorithms or formulas, creating unrecognizable data sets. This method is beneficial when sharing sensitive data with third parties or using it for testing, development, or analysis, ensuring privacy and security while enhancing usability. 

The technique is highly customizable, allowing organizations to select the desired level of protection while maintaining data usability. Redshift data scrambling is cost-effective, requiring no additional hardware or software investments, providing an attractive, low-cost solution for organizations aiming to improve cloud data security.

Data Masking vs. Data Scrambling

Data masking involves replacing sensitive data with a fictitious but realistic value. However, data scrambling, on the other hand, involves changing the original data values using an algorithm or a formula to generate a new set of values.

In some cases, data scrambling can be used as part of data masking techniques. For instance, sensitive data such as credit card numbers can be scrambled before being masked to enhance data protection further.

Setting up Redshift Data Scrambling

Having gained an understanding of Redshift and data scrambling, we can now proceed to learn how to set it up for implementation. Enabling data scrambling in Redshift requires several steps.

To achieve data scrambling in Redshift, SQL queries are utilized to invoke built-in or user-defined functions. These functions utilize a blend of cryptographic techniques and randomization to scramble the data.

The following steps are explained using an example code just for a better understanding of how to set it up:

Step 1: Create a new Redshift cluster

Create a new Redshift cluster or use an existing cluster if available. 

Redshift create cluster

Step 2: Define a scrambling key

Define a scrambling key that will be used to scramble the sensitive data.

 
SET session my_scrambling_key = 'MyScramblingKey';

In this code snippet, we are defining a scrambling key by setting a session-level parameter named <inlineCode>my_scrambling_key<inlineCode> to the value <inlineCode>MyScramblingKey<inlineCode>. This key will be used by the user-defined function to scramble the sensitive data.

Step 3: Create a user-defined function (UDF)

Create a user-defined function in Redshift that will be used to scramble the sensitive data. 


CREATE FUNCTION scramble(input_string VARCHAR)
RETURNS VARCHAR
STABLE
AS $$
DECLARE
scramble_key VARCHAR := 'MyScramblingKey';
BEGIN
-- Scramble the input string using the key
-- and return the scrambled output
RETURN ;
END;
$$ LANGUAGE plpgsql;

Here, we are creating a UDF named <inlineCode>scramble<inlineCode> that takes a string input and returns the scrambled output. The function is defined as <inlineCode>STABLE<inlineCode>, which means that it will always return the same result for the same input, which is important for data scrambling. You will need to input your own scrambling logic.

Step 4: Apply the UDF to sensitive columns

Apply the UDF to the sensitive columns in the database that need to be scrambled.


UPDATE employee SET ssn = scramble(ssn);

For example, applying the <inlineCode>scramble<inlineCode> UDF to a column saying, <inlineCode>ssn<inlineCode> in a table named <inlineCode>employee<inlineCode>. The <inlineCode>UPDATE<inlineCode> statement calls the <inlineCode>scramble<inlineCode> UDF and updates the values in the <inlineCode>ssn<inlineCode> column with the scrambled values.

Step 5: Test and validate the scrambled data

Test and validate the scrambled data to ensure that it is unreadable and unusable by unauthorized parties.


SELECT ssn, scramble(ssn) AS scrambled_ssn
FROM employee;

In this snippet, we are running a <inlineCode>SELECT<inlineCode> statement to retrieve the <inlineCode>ssn<inlineCode> column and the corresponding scrambled value using the <inlineCode>scramble<inlineCode> UDF. We can compare the original and scrambled values to ensure that the scrambling is working as expected. 

Step 6: Monitor and maintain the scrambled data

To monitor and maintain the scrambled data, we can regularly check the sensitive columns to ensure that they are still rearranged and that there are no vulnerabilities or breaches. We should also maintain the scrambling key and UDF to ensure that they are up-to-date and effective.

Different Options for Scrambling Data in Redshift

Selecting a data scrambling technique involves balancing security levels, data sensitivity, and application requirements. Various general algorithms exist, each with unique pros and cons. To scramble data in Amazon Redshift, you can use the following Python code samples in conjunction with a library like psycopg2 to interact with your Redshift cluster. Before executing the code samples, you will need to install the psycopg2 library:


pip install psycopg2

Random

Utilizing a random number generator, the Random option quickly secures data, although its susceptibility to reverse engineering limits its robustness for long-term protection.


import random
import string
import psycopg2

def random_scramble(data):
    scrambled = ""
    for char in data:
        scrambled += random.choice(string.ascii_letters + string.digits)
    return scrambled

# Connect to your Redshift cluster
conn = psycopg2.connect(host='your_host', port='your_port', dbname='your_dbname', user='your_user', password='your_password')
cursor = conn.cursor()
# Fetch data from your table
cursor.execute("SELECT sensitive_column FROM your_table;")
rows = cursor.fetchall()

# Scramble the data
scrambled_rows = [(random_scramble(row[0]),) for row in rows]

# Update the data in the table
cursor.executemany("UPDATE your_table SET sensitive_column = %s WHERE sensitive_column = %s;", [(scrambled, original) for scrambled, original in zip(scrambled_rows, rows)])
conn.commit()

# Close the connection
cursor.close()
conn.close()

Shuffle

The Shuffle option enhances security by rearranging data characters. However, it remains prone to brute-force attacks, despite being harder to reverse-engineer.


import random
import psycopg2

def shuffle_scramble(data):
    data_list = list(data)
    random.shuffle(data_list)
    return ''.join(data_list)

conn = psycopg2.connect(host='your_host', port='your_port', dbname='your_dbname', user='your_user', password='your_password')
cursor = conn.cursor()

cursor.execute("SELECT sensitive_column FROM your_table;")
rows = cursor.fetchall()

scrambled_rows = [(shuffle_scramble(row[0]),) for row in rows]

cursor.executemany("UPDATE your_table SET sensitive_column = %s WHERE sensitive_column = %s;", [(scrambled, original) for scrambled, original in zip(scrambled_rows, rows)])
conn.commit()

cursor.close()
conn.close()

Reversible

By scrambling characters in a decryption key-reversible manner, the Reversible method poses a greater challenge to attackers but is still vulnerable to brute-force attacks. We’ll use the Caesar cipher as an example.


def caesar_cipher(data, key):
    encrypted = ""
    for char in data:
        if char.isalpha():
            shift = key % 26
            if char.islower():
                encrypted += chr((ord(char) - 97 + shift) % 26 + 97)
            else:
                encrypted += chr((ord(char) - 65 + shift) % 26 + 65)
        else:
            encrypted += char
    return encrypted

conn = psycopg2.connect(host='your_host', port='your_port', dbname='your_dbname', user='your_user', password='your_password')
cursor = conn.cursor()

cursor.execute("SELECT sensitive_column FROM your_table;")
rows = cursor.fetchall()

key = 5
encrypted_rows = [(caesar_cipher(row[0], key),) for row in rows]
cursor.executemany("UPDATE your_table SET sensitive_column = %s WHERE sensitive_column = %s;", [(encrypted, original) for encrypted, original in zip(encrypted_rows, rows)])
conn.commit()

cursor.close()
conn.close()

Custom

The Custom option enables users to create tailor-made algorithms to resist specific attack types, potentially offering superior security. However, the development and implementation of custom algorithms demand greater time and expertise.

Best Practices for Using Redshift Data Scrambling

There are several best practices that should be followed when using Redshift Data Scrambling to ensure maximum protection:

Use Unique Keys for Each Table

To ensure that the data is not compromised if one key is compromised, each table should have its own unique key pair. This can be achieved by creating a unique index on the table.


CREATE UNIQUE INDEX idx_unique_key ON table_name (column_name);

Encrypt Sensitive Data Fields 

Sensitive data fields such as credit card numbers and social security numbers should be encrypted to provide an additional layer of security. You can encrypt data fields in Redshift using the ENCRYPT function. Here's an example of how to encrypt a credit card number field:


SELECT ENCRYPT('1234-5678-9012-3456', 'your_encryption_key_here');

Use Strong Encryption Algorithms

Strong encryption algorithms such as AES-256 should be used to provide the strongest protection. Redshift supports AES-256 encryption for data at rest and in transit.


CREATE TABLE encrypted_table (  sensitive_data VARCHAR(255) ENCODE ZSTD ENCRYPT 'aes256' KEY 'my_key');

Control Access to Encryption Keys 

Access to encryption keys should be restricted to authorized personnel to prevent unauthorized access to sensitive data. You can achieve this by setting up an AWS KMS (Key Management Service) to manage your encryption keys. Here's an example of how to restrict access to an encryption key using KMS in Python:


import boto3

kms = boto3.client('kms')

key_id = 'your_key_id_here'
grantee_principal = 'arn:aws:iam::123456789012:user/jane'

response = kms.create_grant(
    KeyId=key_id,
    GranteePrincipal=grantee_principal,
    Operations=['Decrypt']
)

print(response)

Regularly Rotate Encryption Keys 

Regular rotation of encryption keys ensures that any compromised keys do not provide unauthorized access to sensitive data. You can schedule regular key rotation in AWS KMS by setting a key policy that specifies a rotation schedule. Here's an example of how to schedule annual key rotation in KMS using the AWS CLI:

 
aws kms put-key-policy \\
    --key-id your_key_id_here \\
    --policy-name default \\
    --policy
    "{\\"Version\\":\\"2012-10-17\\",\\"Statement\\":[{\\"Effect\\":\\"Allow\\"
    "{\\"Version\\":\\"2012-10-17\\",\\"Statement\\":[{\\"Effect\\":\\"Allow\\"
    \\":\\"kms:RotateKey\\",\\"Resource\\":\\"*\\"},{\\"Effect\\":\\"Allow\\",\
    \"Principal\\":{\\"AWS\\":\\"arn:aws:iam::123456789012:root\\"},\\"Action\\
    ":\\"kms:CreateGrant\\",\\"Resource\\":\\"*\\",\\"Condition\\":{\\"Bool\\":
    {\\"kms:GrantIsForAWSResource\\":\\"true\\"}}}]}"

Turn on logging 

To track user access to sensitive data and identify any unwanted access, logging must be enabled. All SQL commands that are executed on your cluster are logged when you activate query logging in Amazon Redshift. This applies to queries that access sensitive data as well as data-scrambling operations. Afterwards, you may examine these logs to look for any strange access patterns or suspect activities.

You may use the following SQL statement to make query logging available in Amazon Redshift:

ALTER DATABASE  SET enable_user_activity_logging=true;

The stl query system table may be used to retrieve the logs once query logging has been enabled. For instance, the SQL query shown below will display all queries that reached a certain table:

Monitor Performance 

Data scrambling is often a resource-intensive practice, so it’s good to monitor CPU usage, memory usage, and disk I/O to ensure your cluster isn’t being overloaded. In Redshift, you can use the <inlineCode>svl_query_summary<inlineCode> and <inlineCode>svl_query_report<inlineCode> system views to monitor query performance. You can also use Amazon CloudWatch to monitor metrics such as CPU usage and disk space.

Amazon CloudWatch

Establishing Backup and Disaster Recovery

In order to prevent data loss in the case of a disaster, backup and disaster recovery mechanisms should be put in place. Automated backups and manual snapshots are only two of the backup and recovery methods offered by Amazon Redshift. Automatic backups are taken once every eight hours by default. 

Moreover, you may always manually take a snapshot of your cluster. In the case of a breakdown or disaster, your cluster may be restored using these backups and snapshots. Use this SQL query to manually take a snapshot of your cluster in Amazon Redshift:

CREATE SNAPSHOT ; 

To restore a snapshot, you can use the <inlineCode>RESTORE<inlineCode> command. For example:


RESTORE 'snapshot_name' TO 'new_cluster_name';

Frequent Review and Updates

To ensure that data scrambling procedures remain effective and up-to-date with the latest security requirements, it is crucial to consistently review and update them. This process should include examining backup and recovery procedures, encryption techniques, and access controls.

In Amazon Redshift, you can assess access controls by inspecting all roles and their associated permissions in the <inlineCode>pg_roles<inlineCode> system catalog database. It is essential to confirm that only authorized individuals have access to sensitive information.

To analyze encryption techniques, use the <inlineCode>pg_catalog.pg_attribute<inlineCode> system catalog table, which allows you to inspect data types and encryption settings for each column in your tables. Ensure that sensitive data fields are protected with robust encryption methods, such as AES-256.

The AWS CLI commands <inlineCode>aws backup plan<inlineCode> and <inlineCode>aws backup vault<inlineCode> enable you to review your backup plans and vaults, as well as evaluate backup and recovery procedures. Make sure your backup and recovery procedures are properly configured and up-to-date.

Decrypting Data in Redshift

There are different options for decrypting data, depending on the encryption method used and the tools available; the decryption process is similar to of encryption, usually a custom UDF is used to decrypt the data, let’s look at one example of decrypting data scrambling with a substitution cipher.

Step 1: Create a UDF with decryption logic for substitution


CREATE FUNCTION decrypt_substitution(ciphertext varchar) RETURNS varchar
IMMUTABLE AS $$
    alphabet = 'abcdefghijklmnopqrstuvwxyz'
    substitution = 'ijklmnopqrstuvwxyzabcdefgh'
    reverse_substitution = ''.join(sorted(substitution, key=lambda c: substitution.index(c)))
    plaintext = ''
    for i in range(len(ciphertext)):
        index = substitution.find(ciphertext[i])
        if index == -1:
            plaintext += ciphertext[i]
        else:
            plaintext += reverse_substitution[index]
    return plaintext
$$ LANGUAGE plpythonu;

Step 2: Move the data back after truncating and applying the decryption function


TRUNCATE original_table;
INSERT INTO original_table (column1, decrypted_column2, column3)
SELECT column1, decrypt_substitution(encrypted_column2), column3
FROM temp_table;

In this example, encrypted_column2 is the encrypted version of column2 in the temp_table. The decrypt_substitution function is applied to encrypted_column2, and the result is inserted into the decrypted_column2 in the original_table. Make sure to replace column1, column2, and column3 with the appropriate column names, and adjust the INSERT INTO statement accordingly if you have more or fewer columns in your table.

Conclusion

Redshift data scrambling is an effective tool for additional data protection and should be considered as part of an organization's overall data security strategy. In this blog post, we looked into the importance of data protection and how this can be integrated effectively into the  data warehouse. Then, we covered the difference between data scrambling and data masking before diving into how one can set up Redshift data scrambling.

Once you begin to accustom to Redshift data scrambling, you can upgrade your security techniques with different techniques for scrambling data and best practices including encryption practices, logging, and performance monitoring. Organizations may improve their data security posture management (DSPM) and reduce the risk of possible breaches by adhering to these recommendations and using an efficient strategy.

<blogcta-big>

Veronica is the security researcher at Sentra. She brings a wealth of knowledge and experience as a cybersecurity researcher. Her main focuses are researching the main cloud provider services and AI infrastructures for Data related threats and techniques.

Subscribe

Latest Blog Posts

Ward Balcerzak
Ward Balcerzak
January 14, 2026
4
Min Read

The Real Business Value of DSPM: Why True ROI Goes Beyond Cost Savings

The Real Business Value of DSPM: Why True ROI Goes Beyond Cost Savings

As enterprises scale cloud usage and adopt AI, the value of Data Security Posture Management (DSPM) is no longer just about checking a tool category box. It’s about protecting what matters most: sensitive data that fuels modern business and AI workflows.

Traditional content on DSPM often focuses on cost components and deployment considerations. That’s useful, but incomplete. To truly justify DSPM to executives and boards, security leaders need a holistic, outcome-focused view that ties data risk reduction to measurable business impact.

In this blog, we unpack the real, measurable benefits of DSPM, beyond just cost savings, and explain how modern DSPM strategies deliver rapid value far beyond what most legacy tools promise. 

1. Visibility Isn’t Enough - You Need Context

A common theme in DSPM discussions is that tools help you see where sensitive data lives. That’s important, but it’s only the first step. Real value comes from understanding context. Who can access the data, how it’s being used, and where risk exists in the wider security posture. Organizations that stop at discovery often struggle to prioritize risk and justify spend.

Modern DSPM solutions go further by:

  • Correlating data locations with access rights and usage patterns
  • Mapping sensitive data flows across cloud, SaaS, and hybrid environments
  • Detecting shadow data stores and unmanaged copies that silently increase exposure
  • Linking findings to business risk and compliance frameworks

This contextual intelligence drives better decisions and higher ROI because teams aren’t just counting sensitive data, they’re continuously governing it.

2. DSPM Saves Time and Shrinks Attack Surface Fast

One way DSPM delivers measurable business value is by streamlining functions that used to be manual, siloed, and slow:

  • Automated classification reduces manual tagging and human error
  • Continuous discovery eliminates periodic, snapshot-alone inventories
  • Policy enforcement reduces time spent reacting to audit requests

This translates into:

  • Faster compliance reporting
  • Shorter audit cycles
  • Rapid identification and remediation of critical risks

For security leaders, the speed of insight becomes a competitive advantage, especially in environments where data volumes grow daily and AI models can touch every corner of the enterprise.

3. Cost Benefits That Matter, but with Context

Lately I’m hearing many DSPM discussions break down cost components like scanning compute, licensing, operational expenses, and potential cloud savings. That’s a good start because DSPM can reduce cloud waste by identifying stale or redundant data, but it’s not the whole story.

 

Here’s where truly strategic DSPM differs:

Operational Efficiency

When DSPM tools automate discovery, classification, and risk scoring:

  • Teams spend less time on manual reports
  • Alert fatigue drops as noise is filtered
  • Engineers can focus on higher-value work

Breach Avoidance

Data breaches are expensive. According to industry studies, the average cost of a data breach runs into millions, far outweighing the cost of DSPM itself. A DSPM solution that prevents even one breach or major compliance failure pays for itself tenfold

Compliance as a Value Center

Rather than treating compliance as a cost center consider that:

  • DSPM reduces audit overhead
  • Provides automated evidence for frameworks like GDPR, HIPAA, PCI DSS
  • Improves confidence in reporting accuracy

That’s a measurable business benefit CFOs can appreciate and boards expect.

4. DSPM Reduces Risk Vector Multipliers Like AI

One benefit that’s often under-emphasized is how DSPM reduces risk vector multipliers, the factors that amplify risk exponentially beyond simple exposure counts.

In 2026 and beyond, AI systems are increasingly part of the risk profile. Modern DSPM help reduce the heightened risk from AI by:

  • Identifying where sensitive data intersects with AI training or inference pipelines
  • Governing how AI tools and assistants can access sensitive content
  • Providing risk context so teams can prevent data leakage into LLMs

This kind of data-centric, contextual, and continuous governance should be considered a requirement for secure AI adoption, no compromise.

5. Telling the DSPM ROI Story

The most convincing DSPM ROI stories aren’t spreadsheets, they’re narratives that align with business outcomes. The key to building a credible ROI case is connecting metrics, security impact, and business outcomes:

Metric Security Impact Business Outcome
Faster discovery & classification Fewer blind spots Reduced breach likelihood
Consistent governance enforcement Fewer compliance issues Lower audit cost
Contextual risk scoring Better prioritization Efficient resource allocation
AI governance Controlled AI exposure Safe innovation

By telling the story this way, security leaders can speak in terms the board and executives care about: risk reduction, compliance assurance, operational alignment, and controlled growth.

How to Evaluate DSPM for Real ROI

To capture tangible return, don’t evaluate DSPM solely on cost or feature checklists. Instead, test for:

1. Scalability Under Real Load

Can the tool discover and classify petabytes of data, including unstructured content, without degrading performance?

2. Accuracy That Holds Up

Poor classification undermines automation. True ROI requires consistent, top-performing accuracy rates.

3. Operational Cost Predictability

Beware of DSPM solutions that drive unexpected cloud expenses due to inefficient scanning or redundant data reads.

4. Integration With Enforcement Workflows

Visibility without action isn’t ROI. Your DSPM should feed DLP, IAM/CIEM, SIEM/SOAR, and compliance pipelines (ticketing, policy automation, alerts).

ROI Is a Journey, Not a Number

Costs matter, but value lives in context. DSPM is not just a cost center, it’s a force multiplier for secure cloud operations, AI readiness, compliance, and risk reduction. Instead of seeing DSPM as another tool, forward-looking teams view it as a fundamental decision support engine that changes how risk is measured, prioritized, and controlled.

Ready to See Real DSPM Value in Your Environment?

Download Sentra’s “DSPM Dirty Little Secrets” guide, a practical roadmap for evaluating DSPM with clarity, confidence, and production reality in mind.

👉 Download the DSPM Dirty Little Secrets guide now

Want a personalized walkthrough of how Sentra delivers measurable DSPM value?
👉 Request a demo

<blogcta-big>

Read More
Ofir Yehoshua
Ofir Yehoshua
January 13, 2026
3
Min Read

Why Infrastructure Security Is Not Enough to Protect Sensitive Data

Why Infrastructure Security Is Not Enough to Protect Sensitive Data

For years, security programs have focused on protecting infrastructure: networks, servers, endpoints, and applications. That approach made sense when systems were static and data rarely moved. It’s no longer enough.

Recent breach data shows a consistent pattern. Organizations detect incidents, restore systems, and close tickets, yet remain unable to answer the most important question regulators and customers often ask:

Where does my sensitive data reside?

Who or what has access to this data and are they authorized?

Which specific sensitive datasets were accessed or exfiltrated?

Infrastructure security alone cannot answer that question.

Infrastructure Alerts Detect Events, Not Impact

Most security tooling is infrastructure-centric by design. SIEMs, EDRs, NDRs, and CSPM tools monitor hosts, processes, IPs, and configurations. When something abnormal happens, they generate alerts.

What they do not tell you is:

  • Which specific datasets were accessed
  • Whether those datasets contained PHI or PII
  • Whether sensitive data was copied, moved, or exfiltrated

Traditional tools monitor the "plumbing" (network traffic, server logs, etc.) While they can flag that a database was accessed by an unauthorized IP, they often cannot distinguish between an attacker downloading a public template or downloading a table containing 50,000 Social Security numbers. An alert is not the same as understanding the exposure of the data stored inside it. Without that context, incident response teams are forced to infer impact rather than determine it.

The “Did They Access the Data?” Problem

This gap becomes pronounced during ransomware and extortion incidents.

In many cases:

  • Operations are restored from backups
  • Infrastructure is rebuilt
  • Access is reduced
  • (Hopefully!) attackers are removed from the environment

Yet organizations still cannot confirm whether sensitive data was accessed or exfiltrated during the dwell time.

Without data-level visibility:

  • Legal and compliance teams must assume worst-case exposure
  • Breach notifications expand unnecessarily
  • Regulatory penalties increase due to uncertainty, not necessarily damage

The inability to scope an incident accurately is not a tooling failure during the breach, it is a visibility failure that existed long before the breach occurred. Under regulations like GDPR or CCPA/CPRA, if an organization cannot prove that sensitive data wasn’t accessed during a breach, they are often legally required to notify all potentially affected parties. This ‘over-notification’ is costly and damaging to reputation.

Data Movement Is the Real Attack Vulnerability

Modern environments are defined by constant data movement:

  • Cloud migrations
  • SaaS integrations
  • App dev lifecycles
  • Analytics and ETL pipelines
  • AI and ML workflows

Each transition creates blind spots.

Legacy platforms awaiting migration often exist in a “wait state” with reduced monitoring. Data copied into cloud storage or fed into AI pipelines frequently loses lineage and classification context. Posture may vary and traditional controls no longer apply consistently. From an attacker’s perspective, these environments are ideal. From a defender’s perspective, they are blind spots.

Policies Are Not Proof

Most organizations can produce policies stating that sensitive data is encrypted, access-controlled, and monitored. Increasingly, regulators are moving from point-in-time audits to requiring continuous evidence of control.  

Regulators are asking for evidence:

  • Where does PHI live right now?
  • Who or what can access it?
  • How do you know this hasn’t changed since the last audit?

Point-in-time audits cannot answer those questions. Neither can static documentation. Exposure and access drift continuously, especially in cloud and AI-driven environments.

Compliance depends on continuous control, not periodic attestation.

What Data-Centric Security Actually Requires

Accurately proving compliance and scoping breach impact requires security visibility that is anchored to the data itself, not the infrastructure surrounding it.

At a minimum, this means:

  • Continuous discovery and classification of sensitive data
  • Consistent compliance reporting and controls across cloud, SaaS, On-Prem, and migration states
  • Clear visibility into which identities, services, and AI tools can access specific datasets
  • Detection and response signals tied directly to sensitive data exposure and movement

This is the operational foundation of Data Security Posture Management (DSPM) and Data Detection and Response (DDR). These capabilities do not replace infrastructure security controls; they close the gap those controls leave behind by connecting security events to actual data impact.

This is the problem space Sentra was built to address.

Sentra provides continuous visibility into where sensitive data lives, how it moves, and who or what can access it, and ties security and compliance outcomes to that visibility. Without this layer, organizations are forced to infer breach impact and compliance posture instead of proving it.

Why Data-Centric Security Is Required for Today's Compliance and Breach Response

Infrastructure security can detect that an incident occurred, but it cannot determine which sensitive data was accessed, copied, or exfiltrated. Without data-level evidence, organizations cannot accurately scope breaches, contain risk, or prove compliance, regardless of how many alerts or controls are in place. Modern breach response and regulatory compliance require continuous visibility into sensitive data, its lineage, and its access paths. Infrastructure-only security models are no longer sufficient.

Want to see how Sentra provides complete visibility and control of sensitive data?

Schedule a Demo

<blogcta-big>

Read More
Yair Cohen
Yair Cohen
January 9, 2026
3
Min Read
Data Security

How to Prevent Data Breaches in Healthcare and Protect PHI

How to Prevent Data Breaches in Healthcare and Protect PHI

Preventing data breaches in healthcare is no longer just about stopping cyberattacks. In 2026, the greater challenge is maintaining continuous visibility into where protected health information (PHI) lives, how it is accessed, and how it is reused across modern healthcare environments governed by HIPAA compliance requirements.

PHI no longer resides in a single system or under the control of one team. It moves constantly between cloud platforms, electronic health record (EHR) systems, business associates, analytics environments, and AI tools used throughout healthcare operations. While this data sharing enables better patient care and operational efficiency, it also introduces new healthcare cybersecurity risks that traditional, perimeter-based security controls were never designed to manage.

From Perimeter Security to Data-Centric PHI Protection

Many of the most damaging healthcare data breaches in recent years have shared a common root cause:

limited visibility into sensitive data and unclear ownership across shared environments.

Over-permissioned identities, long-lived third-party access, and AI systems interacting with regulated data without proper governance can silently expand exposure until an incident forces disruptive containment measures. Protecting PHI in 2026 requires a data-centric approach to healthcare data security. Instead of focusing only on where data is stored, organizations must continuously understand what sensitive data exists, who can access it, and how that access changes over time. This shift is foundational to effective HIPAA compliance, resilient incident response, and the safe adoption of AI in healthcare.

The Importance of Data Security in Healthcare

Healthcare organizations continue to face disproportionate risk from data breaches, with incidents carrying significant financial, operational, and reputational consequences. Recent industry analyses show that healthcare remains the costliest industry for data breaches, with the average breach costing approximately $7.4 million globally in 2025 and exceeding $10 million per incident in the U.S., driven by regulatory penalties and prolonged recovery efforts.

The scale and complexity of healthcare breaches have also increased. As of late 2025, hundreds of large healthcare data breaches affecting tens of millions of individuals had already been reported in the U.S. alone, including incidents tied to shared infrastructure and third-party service providers. These events highlight how a single exposure can rapidly expand across interconnected healthcare ecosystems.

Importantly, many recent breaches are no longer caused solely by external attacks. Instead, they stem from internal access issues such as over-permissioned identities, misdirected data sharing, and long-lived third-party access, risks now amplified by analytics platforms and AI tools interacting directly with regulated data. As healthcare organizations continue to adopt new technologies, protecting PHI increasingly depends on controlling how sensitive data is accessed, shared, and reused over time, not just where it is stored.

Healthcare Cybersecurity Regulations & Standards

For healthcare organizations, it is especially crucial to protect patient data and follow industry rules. Transitioning to the cloud shouldn't disrupt compliance efforts. But staying on top of strict data privacy regulations adds another layer of complexity to managing healthcare data.

Below are some of the top healthcare cybersecurity regulations relevant to the industry.


Health Insurance Portability and Accountability Act of 1996 (HIPAA)

HIPAA is pivotal in healthcare cybersecurity, mandating compliance for covered entities and business associates. It requires regular risk assessments and adherence to administrative, physical, and technical safeguards for electronic Protected Health Information (ePHI).

HIPAA, at its core, establishes national standards to protect sensitive patient health information from being disclosed without the patient's consent or knowledge. For leaders in healthcare data management, understanding the nuances of HIPAA's Titles and amendments is essential. Particularly relevant are Title II's (HIPAA Administrative Simplification), Privacy Rule, and Security Rule.

HHS 405(d)

HHS 405(d) regulations, under the Cybersecurity Act of 2015, establish voluntary guidelines for healthcare cybersecurity, embodied in the Healthcare Industry Cybersecurity Practices (HICP) framework. This framework covers email, endpoint protection, access management, and more.

Health Information Technology for Economic and Clinical Health (HITECH) Act

The HITECH Act, enacted in 2009, enhances HIPAA requirements, promoting the adoption of healthcare technology and imposing stricter penalties for HIPAA violations. It mandates annual cybersecurity audits and extends HIPAA regulations to business associates.

Payment Card Industry Data Security Standard (PCI DSS)

PCI DSS applies to healthcare organizations processing credit cards, ensuring the protection of cardholder data. Compliance is necessary for handling patient card information.

Quality System Regulation (QSR)

The Quality System Regulation (QSR), enforced by the FDA, focuses on securing medical devices, requiring measures like access prevention, risk management, and firmware updates. Proposed changes aim to align QSR with ISO 13485 standards.

Health Information Trust Alliance (HITRUST)

HITRUST, a global cybersecurity framework, aids healthcare organizations in aligning with HIPAA guidelines, offering guidance on various aspects including endpoint security, risk management, and physical security. Though not mandatory, HITRUST serves as a valuable resource for bolstering compliance efforts.

Preventing Data Breaches in Healthcare with Sentra

Sentra’s Data Security Posture Management (DSPM) automatically discovers and accurately classifies your sensitive patient data. By seamlessly building a well-organized data catalog, Sentra ensures all your patient data is secure, stored correctly and in compliance. The best part is, your data never leaves your environment.

Discover and Accurately Classify your High Risk Patient Data

Discover and accurately classify your high-risk patient data with ease using Sentra. Within minutes, Sentra empowers you to uncover and comprehend your Protected Health Information (PHI), spanning patient medical history, treatment plans, lab tests, radiology images, physician notes, and more. 

Seamlessly build a well-organized data catalog, ensuring that all your high-risk patient data is securely stored and compliant. As a cloud-native solution, Sentra enables you to scale security across your entire data estate. Your cloud data remains within your environment, putting you in complete control of your sensitive data at all times.

Sentra Reduces Data Risks by Controlling Posture and Access

Sentra is your solution for reducing data risks and preventing data breaches by efficiently controlling posture and access. With Sentra, you can enforce security policies for sensitive data, receiving alerts to violations promptly. It detects which users have access to sensitive Protected Health Information (PHI), ensuring transparency and accountability. Additionally, Sentra helps you manage third-party access risks by offering varying levels of access to different providers. Achieve least privilege access by leveraging Sentra's continuous monitoring and tracking capabilities, which keep tabs on access keys and user identities. This ensures that each user has precisely the right access permissions, minimizing the risk of unauthorized data exposure.

Stay on Top of Healthcare Data Regulations with Sentra

Sentra’s Data Security Posture Management (DSPM) solution streamlines and automates the management of your regulated patient data, preparing you for significant security audits. Gain a comprehensive view of all sensitive patient data, allowing our platform to automatically identify compliance gaps for proactive and swift resolution.

Sentra dashboard showing compliance frameworks
Sentra Dashboard shows the issues grouped by compliance frameworks, such as HIPAA and what the compliance posture is

Easily translate your compliance requirements for HIPAA, GDPR, and HITECH into actionable rules and policies, receiving notifications when data is copied or moved between regions. With Sentra, running compliance reports becomes a breeze, providing you with all the necessary evidence, including sensitive data types, regulatory controls, and compliance status for relevant regulatory frameworks.

Conclusion: From Perimeter Security to Continuous Data Governance

Healthcare organizations can no longer rely on perimeter-based controls or periodic audits to prevent data breaches. As PHI spreads across cloud platforms, business associates, and AI-driven workflows, the risk is no longer confined to a single system, it’s embedded in how data is accessed, shared, and reused.

Protecting PHI in 2026 requires continuous visibility into sensitive data and the ability to govern it throughout its lifecycle. This means understanding what regulated data exists, who has access to it, and how that access changes over time - across internal teams, third parties, and AI systems. Without this level of insight, compliance with HIPAA and other healthcare regulations becomes reactive, and incident response becomes disruptive by default.

A data-centric security model allows healthcare organizations to reduce their breach impact, limit regulatory exposure, and adopt AI safely without compromising patient trust. By shifting from static controls to continuous data governance, security and compliance teams can move from guessing where PHI lives to managing it with confidence.

To learn more about how you can enhance your data security posture, schedule a demo with one of our data security experts.

<blogcta-big>

Read More
Expert Data Security Insights Straight to Your Inbox
What Should I Do Now:
1

Get the latest GigaOm DSPM Radar report - see why Sentra was named a Leader and Fast Mover in data security. Download now and stay ahead on securing sensitive data.

2

Sign up for a demo and learn how Sentra’s data security platform can uncover hidden risks, simplify compliance, and safeguard your sensitive data.

3

Follow us on LinkedIn, X (Twitter), and YouTube for actionable expert insights on how to strengthen your data security, build a successful DSPM program, and more!

Before you go...

Get the Gartner Customers' Choice for DSPM Report

Read why 98% of users recommend Sentra.

White Gartner Peer Insights Customers' Choice 2025 badge with laurel leaves inside a speech bubble.