Finding Sensitive Cloud Data in all the Wrong Places

Sentra Case Study
4
 Min Read
Last Updated: 
January 2, 2024
Author Image
Team Sentra
Share the Blog
linkedin logotwitter logogithub logo
Blog post cover image

Not all data can be kept under lock and key. Website resources, for example, always need to be public and S3 buckets are frequently used for this. On the other side, there are things that should never be public - customer information, payroll records, and company IP. But it happens - and can take months or years to notice - if you do at all. 

This is the story of how Sentra identified a large enterprise’s source code in an open S3 bucket. 

As part of work with this company, Sentra was given 7 Petabytes in AWS environments to scan for sensitive data. Specifically, we were looking for IP - source code, documentation, and other proprietary data. 

As we often do, we discovered many issues, but really there were 7 that needed to be remediated immediately, 7 that we defined as ‘critical’. 

The most severe data vulnerability was source code in an open S3 bucket with 7.5 TB worth of data. This file was hiding in a 600 MB .zip file in another .zip file. We also found recordings of client meetings and a tiny 8.9KB excel file with all of their existing current and potential customer data.

source code in an open S3 bucket with 7.5 TB worth of data.

Examples of sensitive data alerts displayed on Sentra's dashboard

So how did such a serious data vulnerability go unnoticed? In this specific case, one of the principal architects at the company had backed up his primary device to their cloud. This isn’t as uncommon as you might think - particularly in the early days of cloud based companies, data is frequently ‘dumped’ into the cloud as the founders and developers are naturally more concerned about speed than security. There’s no CISO on board to build policies. Everyone is just trusted with the data that they have. The early Facebook motto of ‘move fast and break things’ is very much alive in early stage companies. Of course, if they’re successful at building a major company, the problem is now there’s all this data traveling around their cloud environment that no one is tracking, no one is responsible for, and in the case above, no one even knew existed. 

Another explanation for unsecured sensitive data in the public cloud is that some people simply assume that the cloud is secure. As we’ve explained previously - the cloud can be more secure than on-prem architecture - but only if it’s configured properly. A major misconception is that everything in the cloud is secured by the cloud provider. Of course, the mere fact that you can host public resources on the cloud demonstrates how incorrect that assumption is - if you’ve left your S3 buckets open, that data is at risk, regardless of how much security the cloud provider offers. It’s important to remember that the ‘shared model of responsibility’ means that the cloud provider handles things like networking and physical security. But data security is on you. 

This is where accurate data classification needs to play a role. Enterprises need a way of identifying which data is sensitive and critical to keep secure, and what the proper security posture should be. Data classification tools have been around for a long time, but mainly focus on easily identifiable data - credit card and social security numbers for example. Identifying company secrets that weren’t supposed to be publicly accessible wasn’t possible.

The rise of Data Security Posture Management platforms is changing that. By understanding what the security posture of data is supposed to be. By having the security posture ‘follow’ the sensitive data as it travels through the cloud, security teams can ensure their data is always properly secured - no matter where the data ends up. 

Want to find out what sensitive data is publicly accessible in your cloud?

Get in touch with Sentra here to see our DSPM in action. 

Author Image
Team Sentra

Decorative Tube
Decorative Tube