Understanding Data Movement to Avert Proliferation Risks
Understanding the perils your cloud data faces as it proliferates throughout your organization and ecosystems is a monumental task in the highly dynamic business climate we operate in. Being able to see data as it is being copied and travels, monitor its activity and access, and assess its posture allows teams to understand and better manage the full effect of data sprawl.
It ‘connects the dots’ for security analysts who must continually evaluate true risks and threats to data so they can prioritize their efforts. Data similarity and movement are important behavioral indicators in assessing and addressing those risks. This blog will explore this topic in depth.
What Is Data Movement
Data movement is the process of transferring data from one location or system to another – from A to B. This transfer can be between storage locations, databases, servers, or network locations. Copying data from one location to another is simple, however, data movement can get complicated when managing volume, velocity, and variety.
- Volume: Handling large amounts of data.
- Velocity: Overseeing the pace of data generation and processing.
- Variety: Managing a variety of data types.
How Data Moves in the Cloud
Data is free and can be shared anywhere. The way organizations leverage data is an integral part of their success. Although there are many business benefits to moving and sharing data (at a rapid pace), there are also many concerns that arise, mainly dealing with privacy, compliance, and security. Data needs to move quickly, securely, and have the proper security posture at all times.
These are the main ways that data moves in the cloud:
1. Data Distribution in Internal Services: Internal services and applications manage data, saving it across various locations and data stores.
2. ETLs: Extract, Transform, Load processes, involve combining data from multiple sources into a central repository known as a data warehouse. This centralized view supports applications in aggregating diverse data points for organizational use.
3. Developer and Data Scientist Data Usage: Developers and data scientists utilize data for testing and development purposes. They require both real and synthetic data to test applications and simulate real-life scenarios to drive business outcomes.
4. AI/ML/LLM and Customer Data Integration: The utilization of customer data in AI/ML learning processes is on the rise. Organizations leverage such data to train models and apply the results across various organizational units, catering to different use-cases.
What Is Misplaced Data
"Misplaced data" refers to data that has been moved from an approved environment to an unapproved environment. For example, a folder that is stored in the wrong location within a computer system or network. This can result from human error, technical glitches, or issues with data management processes.
When unauthorized data is stored in an environment that is not designed for the type of data, it can lead to data leaks, security breaches, compliance violations, and other negative outcomes.
With companies adopting more cloud services, and being challenged with properly managing the subsequent data sprawl, having misplaced data is becoming more common, which can lead to security, privacy, and compliance issues.
The Challenge of Data Movement and Misplaced Data
Organizations strive to secure their sensitive data by keeping it within carefully defined and secure environments. The pervasive data sprawl faced by nearly every organization in the cloud makes it challenging to effectively protect data, given its rapid multiplication and movement.
It is encouraged for business productivity to leverage data and use it for various purposes that can help enhance and grow the business. However, with the advantages, come disadvantages. There are risks to having multiple owners and duplicate data..
To address this challenge, organizations can leverage the analysis of similar data patterns to gain a comprehensive understanding on how data flows within the organization and help security teams first get visibility of those movement patterns, and then identify whether this movement is authorized. Then they can protect it accordingly and understand which unauthorized movement should be blocked.
This proactive approach allows them to position themselves strategically. It can involve ensuring robust security measures for data at each location, re-confining it by relocating, or eliminating unnecessary duplicates. Additionally, this analytical capability proves valuable in scenarios tied to regulatory and compliance requirements, such as ensuring GDPR - compliant data residency.
Identifying Redundant Data and Saving Cloud Storage Costs
The identification of similarities empowers Chief Information Security Officers (CISOs) to implement best practices, steering clear of actions that lead to the creation of redundant data.
Detecting redundant data helps reduce cloud storage costs and drive up operational efficiency from targeted and prioritized remediation efforts that focus on the critical data risks that matter.
This not only enhances data security posture, but also contributes to a more streamlined and efficient data management strategy.
“Sentra has helped us to reduce our risk of data breaches and to save money on cloud storage costs.”
-Benny Bloch, CISO at Global-e
Security Concerns That Arise
- Data Security Posture Variations Across Locations: Addressing instances where similar data, initially secure, experiences a degradation in security posture during the copying process (e.g., transitioning from private to public, or from encrypted to unencrypted).
- Divergent Access Profiles for Similar Data: Exploring scenarios where data, previously accessible by a limited and regulated set of identities, now faces expanded access by a larger number of identities (users), resulting in a loss of control.
- Data Localization and Compliance Violations: Examining situations where data, mandated to be localized in specific regions, is found to be in violation of organizational policies or compliance rules (with GDPR as a prominent example). By identifying similar sensitive data, we can pinpoint these issues and help users mitigate them.
- Anonymization Challenges in ETL Processes: Identifying issues in ETL processes where data is not only moved but also anonymized. Pinpointing similar sensitive data allows users to detect and mitigate anonymization-related problems.
- Customer Data Migration Across Environments: Analyzing the movement of customer data from production to development environments. This can be used by engineers to test real-life use-cases.
- Data Data Democratization and Movement Between Cloud and Personal Stores: Investigating instances where users export data from organizational cloud stores to personal drives (e.g., OneDrive) for purposes of development, testing, or further business analysis. Once this data is moved to personal data stores, it typically is less secure. This is due to the fact that these personal drives are less monitored and protected, and in control of the private entity (the employee), as opposed to the security/dev teams. These personal drives may be susceptible to security issues arising from misconfiguration, user mistakes or insufficient knowledge.
How Sentra’s DSPM Helps Navigate Data Movement Challenges
- Discover and accurately classify the most sensitive data and provide extensive context about it, for example:
- Where it lives
- Where it has been copied or moved to
- Who has access to it
- Highlight misconfigurations by correlating similar data that has different security posture. This helps you pinpoint the issue and adjust it according to the right posture.
- Quickly identify compliance violations, such as GDPR - when European customer data moves outside of the allowed region, or when financial data moves outside a PCI compliant environment.
- Identify access changes, which helps you to understand the correct access profile by correlating similar data pieces that have different access profiles.
For example, the same data is well kept in a specific environment and can be accessed by 2 very specific users. When the same data moves to a developers environment, it can then be accessed by the whole data engineering team, which exposes more risks.
Leveraging Data Security Posture Management (DSPM) and Data Detection and Response (DDR) tools proves instrumental in addressing the complexities of data movement challenges. These tools play a crucial role in monitoring the flow of sensitive data, allowing for the swift remediation of exposure incidents and vulnerabilities in real-time. The intricacies of data movement, especially in hybrid and multi-cloud deployments, can be challenging, as public cloud providers often lack sufficient tooling to comprehend data flows across various services and unmanaged databases.
Our innovative cloud DLP tooling takes the lead in this scenario, offering a unified approach by integrating static and dynamic monitoring through DSPM and DDR. This integration provides a comprehensive view of sensitive data within your cloud account, offering an updated inventory and mapping of data flows. Our agentless solution automatically detects new sensitive records, classifies them, and identifies relevant policies. In case of a policy violation, it promptly alerts your security team in real time, safeguarding your crucial data assets.
In addition to our robust data identification methods, we prioritize the implementation of access control measures. This involves establishing Role-based Access Control (RBAC) and Attribute-based Access Control (ABAC) policies, so that the right users have permissions at the right times.
Identifying Data Movement With Sentra
Sentra has developed different methods to identify data movements and similarities based on the content of two assets. Our advanced capabilities allow us to pinpoint fully duplicated data, identify similar data, and even uncover instances of partially duplicated data that may have been copied or moved across different locations.
Moreover, we recognize that changes in access often accompany the relocation of assets between different locations.
As part of Sentra’s Data Security Posture Management (DSPM) solution, we proactively manage and adapt access controls to accommodate these transitions, maintaining the integrity and security of the data throughout its lifecycle.
These are the 3 methods we are leveraging:
- Hash similarity - Using each asset unique identifier to locate it across the different data stores of the customer environment.
- Schema similarity - Locate the exact or similar schemas that indicated that there might be similar data in them and then leverage other metadata and statistical methods to simplify the data and find necessary correlations.
- Entity Matching similarity - Detects when parts of files or tables are copied to another data asset. For example, an ETL that extracts only some columns from a table into a new table in a data warehouse.
Another example would be if PII is found in a lower environment, Sentra could detect if this is real or mock customer PII, based on whether this PII was also found in the production environment.
Conclusion
Understanding and managing data sprawl are critical tasks in the dynamic business landscape. Monitoring data movement, access, and posture enable teams to comprehend the full impact of data sprawl, connecting the dots for security analysts in assessing true risks and threats.
Sentra addresses the challenge of data movement by utilizing advanced methods like hash, schema, and entity similarity to identify duplicate or similar data across different locations. Sentra's holistic Data Security Posture Management (DSPM) solution not only enhances data security but also contributes to a streamlined data management strategy.
The identified challenges and Sentra's robust methods emphasize the importance of proactive data management and security in the dynamic digital landscape.
To learn more about how you can enhance your data security posture, schedule a demo with one of our experts.