Why Legacy Data Classification Tools Don't Work Well in the Cloud (But DSPM Does)

Data Security
 Min Read
Last Updated: 
January 9, 2024
Author Image
Yair Cohen
Co-Founder and VP Product
Share the Blog
linkedin logotwitter logogithub logo

Data security teams are always trying to understand where their sensitive data is. Yet this goal has remained out of reach for a number of reasons.

Blog post cover image

The main difficulty is creating a continuously updated data catalog of all production and cloud data. Creating this catalog would involve:

  1.  Identifying everyone in the organization with knowledge of any data stores, with visibility into its contents
  1. Connecting a data classification tool to these data stores
  1. Ensure there’s network connectivity by configuring network and security policies
  1. Confirm that business-critical production systems using each data source won’t be negatively affected, causing damage to performance or availability

Having a process this complex requires a major investment of resources, long workflows, and will still probably not provide the full coverage organizations are looking for. Many so-called successful implementations of such solutions will prove unreliable and too difficult to maintain after a short period of time.

Another pain with a legacy data classification solution is accuracy. Data security professionals are all too aware of the problem of false positives (i.e. wrong classification and data findings) and false negatives (i.e. missing classification of sensitive data that remains unknown). This is mainly due to two reasons. 

  • Legacy classification solutions rely solely on patterns, such as regular expressions, to identify sensitive data, which falls short in both unstructured data and structured data. 
  • These solutions don’t understand the business context around the data, such as how it is being used, by whom, for what purposes and more.

Without the business context, security teams can’t get any actionable items to remove or protect sensitive data against data risks and security breaches.

Lastly, there’s the reason behind high operational costs. Legacy data classification solutions were not built for the cloud, where each data read/write and network operation has a price tag. The cloud also offers a much more cost efficient data storage solution and advanced data services that causes organizations to store much more data than they did before moving to the cloud. On the other hand, the public cloud providers also offer a variety of cloud-native APIs and mechanisms that can extremely benefit a data classification and security solution, such as automated backups, cross account federation, direct access to block storage, storage classes, compute instance types, and much more. However, legacy data classification tools, that were not built for the cloud, will completely ignore those benefits and differences, making them an extremely expensive solution for cloud-native organizations.

DSPM: Built to Solve Data Classification in the Cloud 

These challenges have led to the growth of a new approach to securing cloud data - Data Security Posture Management, or DSPM. Sentra’s DSPM  is able to provide full coverage and an up-to-date data catalog with classification of sensitive data, without any complex deployment or operational work involved. This is achieved thanks to a cloud-native agentless architecture, using cloud-native APIs and mechanisms.

A good example of this approach is how Sentra’s DSPM architecture leverages the public cloud mechanism of automated backups for compute instances, block storage, and more. This allows Sentra to securely run a full discovery and classification technology from within the customer’s premises, in any VPC or subscription/account of the customer’s choice. This offers a number of benefits:

  1. The organization does not need to change any existing infrastructure configuration, network policies, or security groups.
  1. There’s no need to provide individual credentials for each data source in order for Sentra to discover and scan it.
  1. There is never a performance impact on the actual workloads that are compute-based/bounded, such as virtual machines, that run in production environments. In fact, Sentra’s scanning will never connect via network or application layers to those data stores.

Another benefit of a DSPM built for the cloud is classification accuracy.  Sentra’s DSPM provides an unprecedented level of accuracy thanks to more modern and cloud-native capabilities.This starts with advanced statistical relevance for structured data, enabling our classification engine to understand with high confidence that sensitive data is found within a specific column or field, without scanning every row in a large table.

Sentra leverages even more advanced algorithms for key-value stores and document databases. For unstructured data, the use of AI and LLM -based algorithms unlock tremendous accuracy in understanding and detecting sensitive data types by understanding the context within the data itself. Lastly, the combination of data-centric and identity-centric security approaches provides greater context that allows Sentra’s users to know what actions they should take to remediate data risks when it comes to classification.

Here are two examples of how we apply this context:

1. Different Types of Databases

Personal Identifiable Information (PII) that is found in a database in which only users from the Analytics team have access to, is often a privacy violation and a data risk. On the other hand, PII that is found in a database that only three production microservices have access to is expected,  but requires the data to be isolated within a secure VPC. 

2. Different Access Histories

If 100 employees have access to a sensitive shadow data lake, but only 10 people have actually accessed it in the last year. In this case, the solution would be to reduce permissions and implement stricter access controls. We’d also want to ensure that the data has the right retention policy, to reduce both risks and storage costs. Sentra’s risk score prioritization engine takes multiple data layers into account, including data access permissions, activity, sensitivity, movement and misconfigurations, giving enterprises greater visibility and control over their data risk management processes

Finally, with regards to costs, Sentra’s Data Security Posture Management (DSPM) solution utilizes innovative features that make its scanning and classification solution about two or three orders of magnitude more cost efficient than legacy solutions. The first is the use of smart sampling, where Sentra is able to cluster multiple data units that share the same characteristics, and using intelligent sampling with statistical relevance, understand what sensitive data exists within such data assets that are grouped automatically. This is extremely powerful especially when dealing with data lakes that are often the size of dozens of petabytes, without compromising the solution coverage and accuracy.

Second, Sentra’s modern architecture leverages the benefits of cloud ephemeral resources, such as snapshotting and ephemeral compute workloads with a cloud-native orchestration technology that leverages the elasticity and the scale of the cloud. Sentra balances its resource utilization with the needs of the customer's business, providing advanced scan settings that are built and designed for the cloud. This allows teams to optimize cost according to their business needs, such as determining the frequency and sampling of scans, among more advanced features.

To summarize:

  1. Given the current macroeconomic climate, CISOs should find DSPMs like Sentra as an opportunity to increase their security and minimize their costs
  2. DSPM solutions like Sentra bring an important context - awareness to security teams and tools, allowing them to do better risk management and prioritization by focusing on whats important
  3. Data is likely to continue to be the most important asset of every business, as more organizations embrace the power of the cloud. Therefore, a DSPM will be a pivotal tool in realizing the true value of the data while ensuring it is always secured
  4. Accuracy is key and AI is an enabler for a good data classification tool

Author Image
Yair Cohen
Co-Founder and VP Product

Yair brings a wealth of experience in cybersecurity and data product management. In his previous role, Yair led product management at Microsoft and Datadog. With a background as a member of the IDF's Unit 8200 for five years, he possesses over 18 years of expertise in enterprise software, security, data, and cloud computing. Yair has held senior product management positions at Datadog, Digital Asset, and Microsoft Azure Protection.

Decorative Tube
Decorative Tube