Emerging Data Security Challenges In the LLM Era

Data Security
 Min Read
Last Updated: 
February 20, 2024
Author Image
Yoav Regev
Co-Founder and CEO
Share the Blog
linkedin logotwitter logogithub logo

In April of 2023, it was discovered that several Samsung employees reportedly leaked sensitive data via OpenAI’s chatbot ChatGPT. The data leak included the source code of software responsible for measuring semiconductor equipment. This leak emphasizes the importance of taking preventive measures against future breaches associated with Large Language Models (LLMs).

Blog post cover image

LLMs are created to generate responses to questions with data that they continuously receive, which can unintentionally expose confidential information. Even though OpenAI specifically tells users not to share “any sensitive information in your conversations”, ChatGPT and other LLMs are simply too useful to ban for security reasons. You wouldn’t ban an employee from using Google or an engineer from Github. Business productivity (almost) always comes first.

This means that the risks of spilling company secrets and sharing sensitive data with LLMs are not going anywhere. And you can be sure that more generative AI tools will be introduced to the workplace in the near future.

“Banning chatbots one by one will start feeling “like playing whack-a-mole” really soon.”

  • Joe Payne, the CEO of insider risk software solutions provider Code42.

In many ways, the effect of LLMs on data security is similar to the changes we saw 10-15 years ago when companies started moving their data to the cloud.

Broadly speaking, we can say there have been three ‘eras’ of data and data security….

The Era of On-Prem Data

The first was the era of on-prem data. For most of the history of computing, enterprises stored their data in on-prem data centers, and secured access to sensitive data by fortifying the perimeter. The data also wasn’t going anywhere on its own. It lived on company servers, was managed by company IT teams, and they controlled who accessed anything that lived on those systems. 

The Era of the Cloud

Then came the next era - the cloud. Suddenly, corporate data wasn’t static anymore. Data was free and could be shared anywhere - engineers, BI tools, and data scientists were accessing and moving thus free-flowing data to drive the business forward. How you leverage your data becomes an integral part of a company’s success. While the business benefits were clear, this created a number of concerns - particularly around privacy, compliance, and security. Data needed to move quickly, securely, and have the proper security posture at all times. 

The challenge was that now security teams were struggling with basic questions about the data  like: 

  • Where is my data? 
  • Who has access to it? 
  • How can I comply with regulations? 

It was during this era that Data Security Posture Management (DSPM) emerged as a solution to this problem - by ensuring that data always had proper access controls wherever it traveled, this solution promised to address security and compliance issues for enterprises with fast-moving cloud data.

And while we were answering these questions, a new era emerged, with a host of new challenges. 

The Era of AI

The recent rise of Large Language Models (LLMs) as indispensable business tools in just the past few years has introduced a new dimension to data security challenges. It has significantly amplified the existing issues in the cloud era, presenting an unparalleled and exploding problem. While it has accelerated business operations to new heights, this development has also taken the cloud to another level of risk and challenge.

While securing data in the cloud was a challenge, at least you controlled (somehow) your cloud. You could decide who could access it, and when. You could decide what data to keep and what to remove. That has all changed as LLMs and AI play a larger role in company operations. 

Globally, and specifically in the US, organizations are facing the challenge of managing these new AI technology initiatives efficiently while maintaining speed and ensuring regulatory compliance. CEOs and boards are increasingly urging companies to leverage LLMs and AI and use them as databases. However, there is a limited understanding of associated risks and difficulties in controlling the data input into these models. The ultimate goal is to mitigate and prevent such situations effectively. 

LLMs are a black box. You don't know what data your engineers are feeding into it, and you can’t be sure that users aren’t going to be able to manipulate your LLMs into disclosing sensitive information. For example, an engineer training a model might accidentally use real customer data that now exists somewhere in the LLM and might be inadvertently disclosed. Or an LLM powered chatbot might have a vulnerability that leads it to respond with sensitive company data to an inquiry. This is the challenge facing the data security team in this new era. 

How can you know what the LLM has access to, how it’s using that data, and who it’s sharing that data with?

Solving The Challenges of the Cloud and AI Eras at the Same Time

Adding to the complexity for security and compliance professionals is that we’re still dealing with the challenges from the cloud era. Fortunately, Data Security Posture Management (DSPM) has adapted to solve these eras’ primary data security headaches.

For data in the cloud, DSPM can discover your sensitive data anywhere in the cloud environment, understand who can access this data, and assess its vulnerability to security threats and risk of regulatory non-compliance. Organizations can harness advanced technologies while ensuring privacy and compliance seamlessly integrated into their processes. Further, DSPM tackles issues such as finding shadow data, identifying sensitive information with inadequate security postures, discovering duplicate data, and ensuring proper access control.

For the LLM data challenges, DSPMs can automatically secure LLM training data, facilitating swift AI application development, and letting the business run as smoothly as possible.

Any DSPM solution that collaborates with platforms like AWS SageMaker and GCP Vertex AI, as well as other AI IDEs, can ensure secure data handling during ML training. Full integrations with features like Data Access Governance (DAG) and Data Detection and Response (DDR), provide a robust approach to data security and privacy.

AI has the remarkable capacity to reshape our world, yet this must be balanced with a firm dedication to maintaining data integrity and privacy. Ensuring data integrity and privacy in LLMs is crucial for the creation of ethical and responsible AI applications. By utilizing DSPM, organizations are equipped to apply best practices in data protection, thereby reducing the dangers of data breaches, unauthorized access, and bias. This approach is key to fostering a safe and ethical digital environment as we advance in the LLM era.

To learn more about DSPM, schedule a demo with one of our experts.

Yoav Regev has over two decades of experience in the world of cybersecurity, cloud, big data, and machine learning. He was the Head of Cyber Department (Colonel) in the Israeli Military Intelligence (Unit 8200) for nearly 25 years. Reflecting on this experience, it was clear to him that sensitive data had become the most important asset in the world. In the private sector, enterprises that were leveraging data to generate new insights, develop new products, and provide better experiences, were separating themselves from the competition. As data becomes more valuable, it becomes a bigger target, and as the amount of sensitive data grows, so does the importance of finding the most effective way to secure it. That’s why he co-founded Sentra, together with accomplished co-founders, Asaf Kochan, Ron Reiter, and Yair Cohen.

Decorative Tube
Decorative Tube