Sentra Expands Data Security Platform with On-Prem Scanners for Hybrid Environments
All Resources
In this article:
minus iconplus icon
Share the Blog

Use Redshift Data Scrambling for Additional Data Protection

May 3, 2023
8
Min Read

According to IBM, a data breach in the United States cost companies an average of 9.44 million dollars in 2022. It is now more important than ever for organizations to place high importance on protecting confidential information. Data scrambling, which can add an extra layer of security to data, is one approach to accomplish this. 

In this post, we'll analyze the value of data protection, look at the potential financial consequences of data breaches, and talk about how Redshift Data Scrambling may help protect private information.

The Importance of Data Protection

Data protection is essential to safeguard sensitive data from unauthorized access. Identity theft, financial fraud,and other serious consequences are all possible as a result of a data breach. Data protection is also crucial for compliance reasons. Sensitive data must be protected by law in several sectors, including government, banking, and healthcare. Heavy fines, legal problems, and business loss may result from failure to abide by these regulations.

Hackers employ many techniques, including phishing, malware, insider threats, and hacking, to get access to confidential information. For example, a phishing assault may lead to the theft of login information, and malware may infect a system, opening the door for additional attacks and data theft. 

So how to protect yourself against these attacks and minimize your data attack surface?

What is Redshift Data Masking?

Redshift data masking is a technique used to protect sensitive data in Amazon Redshift; a cloud-based data warehousing and analytics service. Redshift data masking involves replacing sensitive data with fictitious, realistic values to protect it from unauthorized access or exposure. It is possible to enhance data security by utilizing Redshift data masking in conjunction with other security measures, such as access control and encryption, in order to create a comprehensive data protection plan.

What is Redshift Data Masking

What is Redshift Data Scrambling?

Redshift data scrambling protects confidential information in a Redshift database by altering original data values using algorithms or formulas, creating unrecognizable data sets. This method is beneficial when sharing sensitive data with third parties or using it for testing, development, or analysis, ensuring privacy and security while enhancing usability. 

The technique is highly customizable, allowing organizations to select the desired level of protection while maintaining data usability. Redshift data scrambling is cost-effective, requiring no additional hardware or software investments, providing an attractive, low-cost solution for organizations aiming to improve cloud data security.

Data Masking vs. Data Scrambling

Data masking involves replacing sensitive data with a fictitious but realistic value. However, data scrambling, on the other hand, involves changing the original data values using an algorithm or a formula to generate a new set of values.

In some cases, data scrambling can be used as part of data masking techniques. For instance, sensitive data such as credit card numbers can be scrambled before being masked to enhance data protection further.

Setting up Redshift Data Scrambling

Having gained an understanding of Redshift and data scrambling, we can now proceed to learn how to set it up for implementation. Enabling data scrambling in Redshift requires several steps.

To achieve data scrambling in Redshift, SQL queries are utilized to invoke built-in or user-defined functions. These functions utilize a blend of cryptographic techniques and randomization to scramble the data.

The following steps are explained using an example code just for a better understanding of how to set it up:

Step 1: Create a new Redshift cluster

Create a new Redshift cluster or use an existing cluster if available. 

Redshift create cluster

Step 2: Define a scrambling key

Define a scrambling key that will be used to scramble the sensitive data.

 
SET session my_scrambling_key = 'MyScramblingKey';

In this code snippet, we are defining a scrambling key by setting a session-level parameter named <inlineCode>my_scrambling_key<inlineCode> to the value <inlineCode>MyScramblingKey<inlineCode>. This key will be used by the user-defined function to scramble the sensitive data.

Step 3: Create a user-defined function (UDF)

Create a user-defined function in Redshift that will be used to scramble the sensitive data. 


CREATE FUNCTION scramble(input_string VARCHAR)
RETURNS VARCHAR
STABLE
AS $$
DECLARE
scramble_key VARCHAR := 'MyScramblingKey';
BEGIN
-- Scramble the input string using the key
-- and return the scrambled output
RETURN ;
END;
$$ LANGUAGE plpgsql;

Here, we are creating a UDF named <inlineCode>scramble<inlineCode> that takes a string input and returns the scrambled output. The function is defined as <inlineCode>STABLE<inlineCode>, which means that it will always return the same result for the same input, which is important for data scrambling. You will need to input your own scrambling logic.

Step 4: Apply the UDF to sensitive columns

Apply the UDF to the sensitive columns in the database that need to be scrambled.


UPDATE employee SET ssn = scramble(ssn);

For example, applying the <inlineCode>scramble<inlineCode> UDF to a column saying, <inlineCode>ssn<inlineCode> in a table named <inlineCode>employee<inlineCode>. The <inlineCode>UPDATE<inlineCode> statement calls the <inlineCode>scramble<inlineCode> UDF and updates the values in the <inlineCode>ssn<inlineCode> column with the scrambled values.

Step 5: Test and validate the scrambled data

Test and validate the scrambled data to ensure that it is unreadable and unusable by unauthorized parties.


SELECT ssn, scramble(ssn) AS scrambled_ssn
FROM employee;

In this snippet, we are running a <inlineCode>SELECT<inlineCode> statement to retrieve the <inlineCode>ssn<inlineCode> column and the corresponding scrambled value using the <inlineCode>scramble<inlineCode> UDF. We can compare the original and scrambled values to ensure that the scrambling is working as expected. 

Step 6: Monitor and maintain the scrambled data

To monitor and maintain the scrambled data, we can regularly check the sensitive columns to ensure that they are still rearranged and that there are no vulnerabilities or breaches. We should also maintain the scrambling key and UDF to ensure that they are up-to-date and effective.

Different Options for Scrambling Data in Redshift

Selecting a data scrambling technique involves balancing security levels, data sensitivity, and application requirements. Various general algorithms exist, each with unique pros and cons. To scramble data in Amazon Redshift, you can use the following Python code samples in conjunction with a library like psycopg2 to interact with your Redshift cluster. Before executing the code samples, you will need to install the psycopg2 library:


pip install psycopg2

Random

Utilizing a random number generator, the Random option quickly secures data, although its susceptibility to reverse engineering limits its robustness for long-term protection.


import random
import string
import psycopg2

def random_scramble(data):
    scrambled = ""
    for char in data:
        scrambled += random.choice(string.ascii_letters + string.digits)
    return scrambled

# Connect to your Redshift cluster
conn = psycopg2.connect(host='your_host', port='your_port', dbname='your_dbname', user='your_user', password='your_password')
cursor = conn.cursor()
# Fetch data from your table
cursor.execute("SELECT sensitive_column FROM your_table;")
rows = cursor.fetchall()

# Scramble the data
scrambled_rows = [(random_scramble(row[0]),) for row in rows]

# Update the data in the table
cursor.executemany("UPDATE your_table SET sensitive_column = %s WHERE sensitive_column = %s;", [(scrambled, original) for scrambled, original in zip(scrambled_rows, rows)])
conn.commit()

# Close the connection
cursor.close()
conn.close()

Shuffle

The Shuffle option enhances security by rearranging data characters. However, it remains prone to brute-force attacks, despite being harder to reverse-engineer.


import random
import psycopg2

def shuffle_scramble(data):
    data_list = list(data)
    random.shuffle(data_list)
    return ''.join(data_list)

conn = psycopg2.connect(host='your_host', port='your_port', dbname='your_dbname', user='your_user', password='your_password')
cursor = conn.cursor()

cursor.execute("SELECT sensitive_column FROM your_table;")
rows = cursor.fetchall()

scrambled_rows = [(shuffle_scramble(row[0]),) for row in rows]

cursor.executemany("UPDATE your_table SET sensitive_column = %s WHERE sensitive_column = %s;", [(scrambled, original) for scrambled, original in zip(scrambled_rows, rows)])
conn.commit()

cursor.close()
conn.close()

Reversible

By scrambling characters in a decryption key-reversible manner, the Reversible method poses a greater challenge to attackers but is still vulnerable to brute-force attacks. We’ll use the Caesar cipher as an example.


def caesar_cipher(data, key):
    encrypted = ""
    for char in data:
        if char.isalpha():
            shift = key % 26
            if char.islower():
                encrypted += chr((ord(char) - 97 + shift) % 26 + 97)
            else:
                encrypted += chr((ord(char) - 65 + shift) % 26 + 65)
        else:
            encrypted += char
    return encrypted

conn = psycopg2.connect(host='your_host', port='your_port', dbname='your_dbname', user='your_user', password='your_password')
cursor = conn.cursor()

cursor.execute("SELECT sensitive_column FROM your_table;")
rows = cursor.fetchall()

key = 5
encrypted_rows = [(caesar_cipher(row[0], key),) for row in rows]
cursor.executemany("UPDATE your_table SET sensitive_column = %s WHERE sensitive_column = %s;", [(encrypted, original) for encrypted, original in zip(encrypted_rows, rows)])
conn.commit()

cursor.close()
conn.close()

Custom

The Custom option enables users to create tailor-made algorithms to resist specific attack types, potentially offering superior security. However, the development and implementation of custom algorithms demand greater time and expertise.

Best Practices for Using Redshift Data Scrambling

There are several best practices that should be followed when using Redshift Data Scrambling to ensure maximum protection:

Use Unique Keys for Each Table

To ensure that the data is not compromised if one key is compromised, each table should have its own unique key pair. This can be achieved by creating a unique index on the table.


CREATE UNIQUE INDEX idx_unique_key ON table_name (column_name);

Encrypt Sensitive Data Fields 

Sensitive data fields such as credit card numbers and social security numbers should be encrypted to provide an additional layer of security. You can encrypt data fields in Redshift using the ENCRYPT function. Here's an example of how to encrypt a credit card number field:


SELECT ENCRYPT('1234-5678-9012-3456', 'your_encryption_key_here');

Use Strong Encryption Algorithms

Strong encryption algorithms such as AES-256 should be used to provide the strongest protection. Redshift supports AES-256 encryption for data at rest and in transit.


CREATE TABLE encrypted_table (  sensitive_data VARCHAR(255) ENCODE ZSTD ENCRYPT 'aes256' KEY 'my_key');

Control Access to Encryption Keys 

Access to encryption keys should be restricted to authorized personnel to prevent unauthorized access to sensitive data. You can achieve this by setting up an AWS KMS (Key Management Service) to manage your encryption keys. Here's an example of how to restrict access to an encryption key using KMS in Python:


import boto3

kms = boto3.client('kms')

key_id = 'your_key_id_here'
grantee_principal = 'arn:aws:iam::123456789012:user/jane'

response = kms.create_grant(
    KeyId=key_id,
    GranteePrincipal=grantee_principal,
    Operations=['Decrypt']
)

print(response)

Regularly Rotate Encryption Keys 

Regular rotation of encryption keys ensures that any compromised keys do not provide unauthorized access to sensitive data. You can schedule regular key rotation in AWS KMS by setting a key policy that specifies a rotation schedule. Here's an example of how to schedule annual key rotation in KMS using the AWS CLI:

 
aws kms put-key-policy \\
    --key-id your_key_id_here \\
    --policy-name default \\
    --policy
    "{\\"Version\\":\\"2012-10-17\\",\\"Statement\\":[{\\"Effect\\":\\"Allow\\"
    "{\\"Version\\":\\"2012-10-17\\",\\"Statement\\":[{\\"Effect\\":\\"Allow\\"
    \\":\\"kms:RotateKey\\",\\"Resource\\":\\"*\\"},{\\"Effect\\":\\"Allow\\",\
    \"Principal\\":{\\"AWS\\":\\"arn:aws:iam::123456789012:root\\"},\\"Action\\
    ":\\"kms:CreateGrant\\",\\"Resource\\":\\"*\\",\\"Condition\\":{\\"Bool\\":
    {\\"kms:GrantIsForAWSResource\\":\\"true\\"}}}]}"

Turn on logging 

To track user access to sensitive data and identify any unwanted access, logging must be enabled. All SQL commands that are executed on your cluster are logged when you activate query logging in Amazon Redshift. This applies to queries that access sensitive data as well as data-scrambling operations. Afterwards, you may examine these logs to look for any strange access patterns or suspect activities.

You may use the following SQL statement to make query logging available in Amazon Redshift:

ALTER DATABASE  SET enable_user_activity_logging=true;

The stl query system table may be used to retrieve the logs once query logging has been enabled. For instance, the SQL query shown below will display all queries that reached a certain table:

Monitor Performance 

Data scrambling is often a resource-intensive practice, so it’s good to monitor CPU usage, memory usage, and disk I/O to ensure your cluster isn’t being overloaded. In Redshift, you can use the <inlineCode>svl_query_summary<inlineCode> and <inlineCode>svl_query_report<inlineCode> system views to monitor query performance. You can also use Amazon CloudWatch to monitor metrics such as CPU usage and disk space.

Amazon CloudWatch

Establishing Backup and Disaster Recovery

In order to prevent data loss in the case of a disaster, backup and disaster recovery mechanisms should be put in place. Automated backups and manual snapshots are only two of the backup and recovery methods offered by Amazon Redshift. Automatic backups are taken once every eight hours by default. 

Moreover, you may always manually take a snapshot of your cluster. In the case of a breakdown or disaster, your cluster may be restored using these backups and snapshots. Use this SQL query to manually take a snapshot of your cluster in Amazon Redshift:

CREATE SNAPSHOT ; 

To restore a snapshot, you can use the <inlineCode>RESTORE<inlineCode> command. For example:


RESTORE 'snapshot_name' TO 'new_cluster_name';

Frequent Review and Updates

To ensure that data scrambling procedures remain effective and up-to-date with the latest security requirements, it is crucial to consistently review and update them. This process should include examining backup and recovery procedures, encryption techniques, and access controls.

In Amazon Redshift, you can assess access controls by inspecting all roles and their associated permissions in the <inlineCode>pg_roles<inlineCode> system catalog database. It is essential to confirm that only authorized individuals have access to sensitive information.

To analyze encryption techniques, use the <inlineCode>pg_catalog.pg_attribute<inlineCode> system catalog table, which allows you to inspect data types and encryption settings for each column in your tables. Ensure that sensitive data fields are protected with robust encryption methods, such as AES-256.

The AWS CLI commands <inlineCode>aws backup plan<inlineCode> and <inlineCode>aws backup vault<inlineCode> enable you to review your backup plans and vaults, as well as evaluate backup and recovery procedures. Make sure your backup and recovery procedures are properly configured and up-to-date.

Decrypting Data in Redshift

There are different options for decrypting data, depending on the encryption method used and the tools available; the decryption process is similar to of encryption, usually a custom UDF is used to decrypt the data, let’s look at one example of decrypting data scrambling with a substitution cipher.

Step 1: Create a UDF with decryption logic for substitution


CREATE FUNCTION decrypt_substitution(ciphertext varchar) RETURNS varchar
IMMUTABLE AS $$
    alphabet = 'abcdefghijklmnopqrstuvwxyz'
    substitution = 'ijklmnopqrstuvwxyzabcdefgh'
    reverse_substitution = ''.join(sorted(substitution, key=lambda c: substitution.index(c)))
    plaintext = ''
    for i in range(len(ciphertext)):
        index = substitution.find(ciphertext[i])
        if index == -1:
            plaintext += ciphertext[i]
        else:
            plaintext += reverse_substitution[index]
    return plaintext
$$ LANGUAGE plpythonu;

Step 2: Move the data back after truncating and applying the decryption function


TRUNCATE original_table;
INSERT INTO original_table (column1, decrypted_column2, column3)
SELECT column1, decrypt_substitution(encrypted_column2), column3
FROM temp_table;

In this example, encrypted_column2 is the encrypted version of column2 in the temp_table. The decrypt_substitution function is applied to encrypted_column2, and the result is inserted into the decrypted_column2 in the original_table. Make sure to replace column1, column2, and column3 with the appropriate column names, and adjust the INSERT INTO statement accordingly if you have more or fewer columns in your table.

Conclusion

Redshift data scrambling is an effective tool for additional data protection and should be considered as part of an organization's overall data security strategy. In this blog post, we looked into the importance of data protection and how this can be integrated effectively into the  data warehouse. Then, we covered the difference between data scrambling and data masking before diving into how one can set up Redshift data scrambling.

Once you begin to accustom to Redshift data scrambling, you can upgrade your security techniques with different techniques for scrambling data and best practices including encryption practices, logging, and performance monitoring. Organizations may improve their data security posture management (DSPM) and reduce the risk of possible breaches by adhering to these recommendations and using an efficient strategy.

<blogcta-big>

Veronica is the security researcher at Sentra. She brings a wealth of knowledge and experience as a cybersecurity researcher. Her main focuses are researching the main cloud provider services and AI infrastructures for Data related threats and techniques.

Subscribe

Latest Blog Posts

Ward Balcerzak
Ward Balcerzak
November 12, 2025
4
Min Read
Data Security

Best DSPM Tools: Top 9 Vendors Compared

Best DSPM Tools: Top 9 Vendors Compared

Enhanced DSPM Adoption Is the Most Important Data Security Trend of 2026

Over the past few years, organizations have realized that traditional security tools can’t keep pace with how data moves and grows today. Exploding volumes of sensitive data now flourish across multi-cloud environments, SaaS platforms, and AI systems, often without full visibility by the teams responsible for securing it. Unstructured data presents the greatest risk - representing over 80% of corporate data.

That’s why Data Security Posture Management (DSPM) has become a critical part of the modern security stack. DSPM tools help organizations automatically discover, classify, monitor, and protect sensitive data - no matter where it lives or travels.

But in 2026, the data security game is changing. Many DSPMs can tell you what your data is,  but more is needed. Leading DSPM platforms are going beyond visibility. They’re delivering real-time AI-enhanced contextual business insights, automated remediation, and AI-aware accurate protection that scales with your dynamic data.

AI-enhanced DSPM Capabilities in 2026

Not all DSPM tools are built the same. The top platforms share a few key traits that define the next generation of data security posture management:

Capability Why It Matters
Continuous discovery and classification at scale Real-time visibility into all sensitive data across cloud, SaaS, and on-prem systems. Efficiency, at petabyte scale, to allow for scanning frequency commensurate with business risk.
Contextual risk analysis Understanding what data is sensitive, who can access it, and how it’s being used. Understanding the business context around data so that appropriate actions can be taken.
Automated remediation Native capabilities and Integration with systems that correct risky configurations or excessive access automatically.
Integration and scalability Seamless connections to CSPM, SIEM, IAM, ITSM, and SOAR tools to unify data risk management and streamline workflows.
AI and model governance Capabilities to secure data used in GenAI agents, copilot assistants, and pipelines.

Top DSPM Tools to Watch in 2026

Based on recent analyst coverage, market growth, and innovation across the industry, here are the top DSPM platforms to watch this year, each contributing to how data security is evolving.

1. Sentra

As a cloud-native DSPM platform, Sentra focuses on continuous data protection, not just visibility. It discovers and accurately classifies sensitive data in real time across all cloud environments, while automatically remediating risks through policy-driven automation.

What sets Sentra apart:

  • Continuous, automated discovery and classification across your entire data estate - cloud, SaaS, and on-premises.
  • Business Contextual insights that understand the purpose of data, accurately linking data, identity, and risk.
  • Automatic learning to discern customer unique data types and continuously improve labeling over time.
  • Petabyte scaling and low compute consumption for 10X cost efficiency.
  • Automated remediation workflows and integrations to fix issues instantly.
  • Built-in coverage for data flowing through AI and SaaS ecosystems.

Ideal for: Security teams looking for a cloud-native DSPM platform built for scalability in the AI era with automation at its core.

2. BigID

A pioneer in data discovery and classification, BigID bridges DSPM and privacy governance, making it a good choice for compliance-heavy sectors.


Ideal for: Organizations prioritizing data privacy, governance, and audit readiness.

3. Prisma Cloud (Palo Alto Networks)

Prisma’s DSPM offering integrates closely with CSPM and CNAPP components, giving security teams a single pane of glass for infrastructure and data risk.


Ideal for: Enterprises with hybrid or multi-cloud infrastructures already using Palo Alto tools.

4. Microsoft Purview / Defender DSPM

Microsoft continues to invest heavily in DSPM through Purview, offering rich integration with Microsoft 365 and Azure ecosystems. Note: Sentra integrates with Microsoft Purview Information Protection (MPIP) labeling and DLP policies.

Ideal for: Microsoft-centric organizations seeking native data visibility and compliance automation.

5. Securiti.ai

Positioned as a “Data Command Center,” Securiti unifies DSPM, privacy, and governance. Its strength lies in automation and compliance visibility and SaaS coverage.


Ideal for: Enterprises looking for an all-in-one governance and DSPM solution.

6. Cyera

Cyera has gained attention for serving the SMB segment with its DSPM approach. It uses LLMs for data context, supplementing other classification methods, and provides integrations to IAM and other workflow tools.


Ideal for: Small/medium growing companies that need basic DSPM functionality.

7. Wiz

Wiz continues to lead in cloud security, having added DSPM capabilities into its CNAPP platform. They’re known for deep multi-cloud visibility and infrastructure misconfiguration detection.

Ideal for: Enterprises running complex cloud environments looking for infrastructure vulnerability and misconfiguration management.

8. Varonis

Varonis remains a strong player for hybrid and on-prem data security, with deep expertise in permissions and access analytics and focus on SaaS/unstructured data.


Ideal for: Enterprises with legacy file systems or mixed cloud/on-prem architectures.

9. Netwrix

Netwrix’s platform incorporates DSPM-related features into its auditing and access control suite.

Ideal for: Mid-sized organizations seeking DSPM as part of a broader compliance solution.

Emerging DSPM Trends to Watch in 2026

  1. AI Data Security: As enterprises adopt GenAI, DSPM tools are evolving to secure data used in training and inference.

  2. Identity-Centric Risk: Understanding and controlling both human and machine identities is now central to data posture.

  3. Automation-Driven Security: Remediation workflows are becoming the differentiator between “good” and “great.”

Market Consolidation: Expect to see CNAPP, legacy security, and cloud vendors acquiring DSPM startups to strengthen their coverage.

How to Choose the Right DSPM Tool

When evaluating a DSPM solution, align your choice with your data landscape and goals:

  • Cloud-Native Company Choose tools designed for cloud-first environments (like Sentra, Securiti, Wiz).
  • Compliance Priority Platforms like Sentra, BigID or Securiti excel in privacy and governance.
  • Microsoft-Heavy Stack Purview and Sentra DSPM offer native integration.
  • Hybrid Environment Consider Varonis, Prisma Cloud, or Sentra for extended visibility.
  • Enterprise Scalability Evaluate deployment ease, petabyte scalability, cloud resource consumption, scanning efficiency, etc. (Sentra excels here)

*Pro Tip: Run a proof of concept (POC) across multiple environments to test scalability, accuracy, and operational cost effectiveness before full deployment.

Final Thoughts: DSPM Is About Action

The best DSPM tools in 2026 share one core principle, they help organizations move from visibility to action.

At Sentra, we believe that the future of DSPM lies in continuous, automated data protection:

  • Real-time discovery of sensitive data @ scale
  • Context-aware prioritization for business insight
  • Automated remediation that reduces risk instantly

As data continues to power AI, analytics, and innovation, DSPM ensures that innovation never comes at the cost of security. See how Sentra helps leading enterprises protect data across multi-cloud and SaaS environments.

<blogcta-big>

Read More
Gilad Golani
Gilad Golani
November 6, 2025
4
Min Read

How SLMs (Small Language Models) Make Sentra’s AI Faster and More Accurate

How SLMs (Small Language Models) Make Sentra’s AI Faster and More Accurate

The LLM Hype, and What’s Missing

Over the past few years, large language models (LLMs) have dominated the AI conversation. From writing essays to generating code, LLMs like GPT-4 and Claude have proven that massive models can produce human-like language and reasoning at scale.

But here's the catch: not every task needs a 70-billion-parameter model. Parameters are computationally expensive - they require both memory and processing time.

At Sentra, we discovered early on that the work our customers rely on for accurate, scalable classification of massive data flows - isn’t about writing essays or generating text. It’s about making decisions fast, reliably, and cost-effectively across dynamic, real-world data environments. While large language models (LLMs) are excellent at solving general problems, it creates a lot of unnecessary computational overhead.

That’s why we’ve shifted our focus toward Small Language Models (SLMs) - compact, specialized models purpose-built for a single task - understanding and classifying data efficiently. By running hundreds of SLMs in parallel on regular CPUs, Sentra can deliver faster insights, stronger data privacy, and a dramatically lower total cost of AI-based classification that scales with their business, not their cloud bill.

What Is an SLM?

An SLM is a smaller, domain-specific version of a language model. Instead of trying to understand and generate any kind of text, an SLM is trained to excel at a particular task, such as identifying the topic of a document (what the document is about or what type of document it is), or detecting sensitive entities within documents, such as passwords, social security numbers, or other forms of PII.

In other words: If an LLM is a generalist, an SLM is a specialist. At Sentra, we use SLMs that are tuned and optimized for security data classification, allowing them to process high volumes of content with remarkable speed, consistency, and precision. These SLMs are based on standard open source models, but trained with data that was curated by Sentra, to achieve the level of accuracy that only Sentra can guarantee.

From LLMs to SLMs: A Strategic Evolution

Like many in the industry, we started by testing LLMs to see how well they could classify and label data. They were powerful, but also slow, expensive, and difficult to scale. Over time, it became clear: LLMs are too big and too expensive to run on customer data for Sentra to be a viable, cost effective solution for data classification.

Each SLM handles a focused part of the process: initial categorization, text extraction from documents and images, and sensitive entity classification. The SLMs are not only accurate (even more accurate than LLMs classifying using prompts) - they can run on standard CPUs efficiently, and they run inside the customer’s environment, as part of Sentra’s scanners.

The Benefits of SLMs for Customers

a. Speed and Efficiency

SLMs process data faster because they’re lean by design. They don’t waste cycles generating full sentences or reasoning across irrelevant contexts. This means real-time or near-real-time classification, even across millions of data points.

b. Accuracy and Adaptability

SLMs are pre-trained “zero-shot” language models that can categorize and classify generically, without the need to pre-train on a specific task in advance. This is the meaning of “zero shot” - it means that regardless of the data it was trained on, the model can classify an arbitrary set of entities and document labels without training on each one specifically. This is possible due to the fact that language models are very advanced, and they are able to capture deep natural language understanding at the training stage.

Regardless of that, Sentra fine tunes these models to further increase the accuracy of the classification, by curating a very large set of tagged data that resembles the type of data that our customers usually run into.

Our feedback loops ensure that model performance only gets better over time - a direct reflection of our customers’ evolving environments.

c. Cost and Sustainability

Because SLMs are compact, they require less compute power, which means lower operational costs and a smaller carbon footprint. This efficiency allows us to deliver powerful AI capabilities to customers without passing on the heavy infrastructure costs of running massive models.

d. Security and Control

Unlike LLMs hosted on external APIs, SLMs can be run within Sentra’s secure environment, preserving data privacy and regulatory compliance. Customers maintain full control over their sensitive information - a critical requirement in enterprise data security.

A Quick Comparison: SLMs vs. LLMs

The difference between SLMs and LLMs becomes clear when you look at their performance across key dimensions:

Factor SLMs LLMs
Speed Fast, optimized for classification throughput Slower and more compute-intensive for large-scale inference
Cost Cost-efficient Expensive to run at scale
Accuracy (for simple tasks) Optimized for classification Comparable but unnecessary overhead
Deployment Lightweight, easy to integrate Complex and resource-heavy
Adaptability (with feedback) Continuously fine-tuned, ability to fine tune per customer Harder to customize, fine-tuning costly
Best Use Case Classification, tagging, filtering Reasoning and analysis, generation, synthesis

Continuous Learning: How Sentra’s SLMs Grow

One of the most powerful aspects of our SLM approach is continuous learning. Each Sentra customer project contributes valuable insights, from new data patterns to evolving classification needs. These learnings feed back into our training workflows, helping us refine and expand our models over time.

While not every model retrains automatically, the system is built to support iterative optimization: as our team analyzes feedback and performance, models can be fine-tuned or extended to handle new categories and contexts.

The result is an adaptive ecosystem of SLMs that becomes more effective as our customer base and data diversity grow, ensuring Sentra’s AI remains aligned with real-world use cases.

Sentra’s Multi-SLM Architecture

Sentra’s scanning technology doesn’t rely on a single model. We run many SLMs in parallel, each specializing in a distinct layer of classification:

  1. Embedding models that convert data into meaningful vector representations
  2. Entity Classification models that label sensitive entities
  3. Document Classification models that label documents by type
  4. Image-to-text and speech-to-text models that are able to process non-textual data into textual data

This layered approach allows us to operate at scale - quickly, cheaply, and with great results. In practice, that means faster insights, fewer errors, and a more responsive platform for every customer.

The Future of AI Is Specialized

We believe the next frontier of AI isn’t about who can build the biggest model, it’s about who can build the most efficient, adaptive, and secure ones.

By embracing SLMs, Sentra is pioneering a future where AI systems are purpose-built, transparent, and sustainable. Our approach aligns with a broader industry shift toward task-optimized intelligence - models that do one thing extremely well and can learn continuously over time.

Conclusion: The Power of Small

At Sentra, we’ve learned that in AI, bigger isn’t always better. Our commitment to SLMs reflects our belief that efficiency, adaptability, and precision matter most for customers. By running thousands of small, smart models rather than a single massive one, we’re able to classify data faster, cheaper, and with greater accuracy - all while ensuring customer privacy and control.

In short: Sentra’s SLMs represent the power of small, and the future of intelligent classification.

<blogcta-big>

Read More
Aarti Gadhia
Aarti Gadhia
October 27, 2025
3
Min Read
Data Security

My Journey to Empower Women in Cybersecurity

My Journey to Empower Women in Cybersecurity

Finding My Voice: From Kenya to the Global Stage

I was born and raised in Kenya, the youngest of three and the only daughter. My parents, who never had the chance to finish their education, sacrificed everything to give me opportunities they never had. Their courage became my foundation.

At sixteen, my mother signed me up to speak at a community event, without telling me first! I stood before 500 people and spoke about something that had long bothered me: there were no women on our community board. That same year, two women were appointed for the first time in our community’s history. This year, I was given the recognition for being a Community Leader at the Global Gujrati Gaurav Awards in BC for my work in educating seniors on cyber safety and helping many immigrants secure jobs.

I didn’t realize it then, but that moment would define my purpose: to speak up for those whose voices aren’t always heard.

From Isolation to Empowerment

When I moved to the UK to study Financial Economics, I faced a different kind of challenge - isolation. My accent made me stand out, and not always in a good way. There were times I felt invisible, even rejected. But I made a promise to myself in those lonely moments that no one else should feel the same way.

Years later, as a founding member of WiCyS Western Affiliate, I helped redesign how networking happens at cybersecurity events. Instead of leaving it to chance, we introduced structured networking that ensured everyone left with at least one new connection. It was a small change, but it made a big difference. Today, that format has been adopted by organizations like ISC2 and ISACA, creating spaces where every person feels they belong. 

Breaking Barriers and Building SHE

When I pivoted into cybersecurity sales after moving to Canada, I encountered another wall. I applied for a senior role and failed a personality test, one that unfairly filtered out many talented women. I refused to accept that. I focused on listening, solving real customer challenges, and eventually became the top seller. That success helped eliminate the test altogether, opening doors for many more women who came after me. That experience planted a seed that would grow into one of my proudest initiatives: SHE (Sharing Her Empowerment).

It started as a simple fireside chat on diversity and inclusion - just 40 seats over lunch. Within minutes of sending the invite, we had 90 people signed up. Executives moved us into a larger room, and that event changed everything. SHE became our first employee resource group focused on empowering women, increasing representation in leadership, and amplifying women’s voices within the organization. Even with just 19% women, we created a ripple effect that reached the boardroom and beyond.

SHE showed me that when women stand together, transformation happens.

Creating Pathways for the Next Generation

Mentorship has always been close to my heart. During the pandemic, I met incredible women, who were trying to break into cybersecurity but kept facing barriers. I challenged hiring norms, advocated for fair opportunities, and helped launch internship programs that gave women hands-on experience. Today, many of them are thriving in their cyber careers, a true reflection of what’s possible when we lift as we climb.

Through Standout to Lead, I partnered with Women Get On Board to help women in cybersecurity gain board seats. Watching more women step into decision-making roles reminds me that leadership isn’t about titles, it’s about creating pathways for others.

Women in Cybersecurity: Our Collective Story

This year, I’m deeply honored to be named among the Top 20 Cybersecurity Women of the World by the United Cybersecurity Alliance. Their mission - to empower women, elevate diverse voices, and drive equity in our field, mirrors everything I believe in.

I’m also thrilled to be part of the upcoming documentary premiere, “The WOMEN IN SECURITY Documentary,” proudly sponsored by Sentra, Amazon WWOS, and Pinkerton among others. This film shines a light on the fearless women redefining what leadership looks like in our industry.

As a member of Sentra’s community, I see the same commitment to visibility, inclusion, and impact that has guided my journey. Together, we’re not just securing data, we’re securing the future of those who will lead next.

Asante Sana – Thank You

My story, my safari, is still being written. I’ve learned that impact doesn’t come from perfection, but from purpose. Whether it’s advocating for fairness, mentoring the next generation, or sharing our stories, every step we take matters.

To every woman, every underrepresented voice in STEM, and everyone who’s ever felt unseen - stay authentic, speak up, and don’t be afraid of the outcome. You might just change the world.

Join me and the Sentra team at The WOMEN IN SECURITY Documentary Premiere, a celebration of leadership, resilience, and the voices shaping the future of our industry.

Save your seat at The Women in Security premiere here (spots are limited).

Follow Sentra on LinkedIn and YouTube for more updates on the event and stories that inspire change.

<blogcta-big>

Read More
decorative ball
Expert Data Security Insights Straight to Your Inbox
What Should I Do Now:
1

Get the latest GigaOm DSPM Radar report - see why Sentra was named a Leader and Fast Mover in data security. Download now and stay ahead on securing sensitive data.

2

Sign up for a demo and learn how Sentra’s data security platform can uncover hidden risks, simplify compliance, and safeguard your sensitive data.

3

Follow us on LinkedIn, X (Twitter), and YouTube for actionable expert insights on how to strengthen your data security, build a successful DSPM program, and more!

Before you go...

Get the Gartner Customers' Choice for DSPM Report

Read why 98% of users recommend Sentra.

Gartner Certificate for Sentra