Data governance is how we manage the availability, usability, integrity, privacy and security of the data in our enterprise systems. It’s based both on the internal policies that dictate how data can be used, and on the global and local regulations that so tightly control how we need to handle our data.
Effective data governance ensures that data is trustworthy, consistent and doesn't get misused. As businesses began to increasingly rely on data analytics to optimize operations and drive decision-making, data governance became a central part of enterprise operations. And as protection of data and data assets came under ever-closer regulatory scrutiny, data governance became a key part of policymaking, as well.
But then came the move to the cloud. This represented a tectonic shift in how data is stored, transported and accessed. And data governance – notably the security facet of data governance – has not quite been able to keep up.
Cloud Data Governance: A Different Game
The shift to the cloud radically changed data governance. From many perspectives, it’s a totally different game. The key differentiator? Cloud data governance, unlike on-prem data governance, currently does not actually control all sensitive data.
The origin of this challenge is that, in the cloud, there is simply too much movement of data. This is not a bad thing. The democratization of data has dramatically improved productivity and development speed. It’s facilitated the rise of a whole culture of data-driven decision making.
Yet the goal of cloud data governance is to streamline data collection, storage, and use within the cloud - enabling collaboration while maintaining compliance and security. And the fact is that data in the cloud is used at such scale and with such intensity that it’s become nearly impossible to govern, let alone secure. The cloud has given rise to every data security stakeholder’s nightmare: massive shadow data.
The Rise of Shadow Data
Shadow data is any data that is not subject to your organization’s data governance or security framework. It’s not governed by your data policies. It’s not stored according to your preferred security structure. It’s not subject to your access control limitations. And it’s probably not even visible to the security tools you use to monitor data access.
In most cases, shadow data is not born of malicious roots. It’s just data in the wrong place, at the wrong time. Where does shadow data come from?
- …from prevalent hybrid and multi-cloud environments. Though excellent for productivity, these ecosystems present serious visibility challenges.
- …from cloud-driven CI/CD, which speeds interactions between development pipelines and source code repositories. Yet while making life easier for developers, cloud-driven CI/CD also frequently (and usually inadvertently) sacrifices data security to expediency.
- …from distributed cloud-native apps based on containers, serverless functions and microservices – which leaves data spread across hundreds of databases, data warehouses, data pipelines, and external SaaS warehouses.
Cloud Data Governance Today and Tomorrow
In an attempt to duplicate the success of on-prem data governance paradigms in the cloud, many organizations attempt to create cloud data catalogs.
Data catalog tools and services collect metadata and offer big data management and search capabilities. The goal is to provide analysts and data users with a way to find data they need – while also creating an inventory of available data. Yet while catalogs have become the core component of on-prem big data governance, in the cloud this paradigm falls short.
Data catalogs are labor intensive, mostly manual endeavors. There are data cataloging tools, but most lack automatic discovery and classification. This means that teams have to manually connect to each data source, then manually classify and catalog data. This is why data cataloging at the enterprise level is a full-time job, and frequently a departmental task. And once the catalog is created, multiple security and governance teams still need to work to enforce access to sensitive data.
Yet despite these efforts, shadow cloud data persists – and is growing. What’s more, increasingly popular unstructured data sources like Amazon S3 can’t be partitioned into the business flows they contain, nor effectively classified manually.
Taken together, all this means there’s an urgent emerging need for automatic data discovery, as well as a way to ensure that data discovered is also data governed.
This is where Data Lifecycle Security comes in.
Data Lifecycle Security solutions enable effective cloud data governance by following sensitive data through the cloud - helping organizations identify data movement and ensuring that security posture follows it. It accomplishes this by first discovering sensitive data, including shadow or abandoned data. Then it automatically classifies data types using AI models, determines whether the data has the proper security posture and notifies the remediation teams if not.
It’s crucial for cloud-facing organizations to remember that the distributed nature of cloud computing means they may not currently know exactly where all their data is stored. Data governance and security cannot be ‘lifted and shifted’ from on-prem to the cloud. But Data Lifecycle Security solutions can bridge the gap between the need for cloud data governance and security and the sub-optimal performance of existing paradigms.
To learn more about how Sentra and Data Lifecycle Security can help you apply effective cloud data governance, watch a demo here