A new report shows that the amount of sensitive data that companies are storing in non-production environments like development, testing, analytics, AI/ML, etc. Executives are also increasingly concerned about protecting sensitive data, and pumping it into new AI products won’t help.
According to the Delphix 2024 State of Data Compliance and Security Report, 74% of organizations that handle sensitive data increased the amount of data they store in non-production (also known as low-level) environments in the last year. Additionally, 91% are concerned that this will result in an increased exposure footprint, putting them at risk of breaches and non-compliance fines.
The growing number of online consumers and ongoing digital transformation efforts are driving an overall increase in the amount of consumer data held by businesses. IDC predicts that by 2025, the global data volume will grow to 163 zettabytes, 10 times the amount of data generated in 2016, at 16.1 zettabytes.
As a result, the amount of sensitive data stored, including personally identifiable information, protected health information, and financial details, is also increasing.
Sensitive data is often created and stored in production (live) environments with tight controls and access restrictions, such as CRM or ERP, but standard IT operations often mean that data is often copied multiple times to non-production environments where it is accessible to more personnel and increases the risk of a breach.
The report’s findings are the result of a survey of 250 senior executives from organizations with more than 5,000 employees who handle sensitive consumer data. The survey was conducted by software provider Perforce.
See also: National Public Data Breach: 2.7 Billion Records Exposed on Dark Web
More than half of businesses have already experienced a data breach
More than half of respondents said they have already experienced a breach of sensitive data stored in non-production environments.
There’s other evidence that the problem is getting worse: An Apple study found that data breaches will increase by 20% between 2022 and 2023. In fact, 61% of Americans have learned that their personal data has been breached or compromised at some point.
A Perforce report found that 42% of responding organizations have fallen victim to ransomware, and the malware is a growing global threat, with a Malwarebytes study published this month finding that globally, ransomware attacks increased 33% last year.
Part of the problem is that global supply chains are becoming longer and more complex, increasing the number of points attackers can penetrate. According to a report by the Identity Theft Resource Center, the number of organizations affected by supply chain attacks jumped by more than 2,600 percentage points between 2018 and 2023. What’s more, payouts are expected to exceed $1 billion (£790 million) for the first time in 2023, making it increasingly lucrative for attackers.
AI is the biggest culprit in keeping consumer data safe
As companies introduce AI into their business processes, it becomes increasingly difficult to control what data is stored where.
AI systems often require the use of sensitive consumer data for training and operation, and the complexity of their algorithms and potential integrations with external systems can create new attack vectors that are difficult to manage. In fact, the report found that 60% of respondents said AI and ML is the main driver of the increase in sensitive data in non-production environments.
“AI environments may be less well managed and secured than production environments,” the report authors wrote. “As a result, they may be more susceptible to being compromised.”
Business decision makers are aware of this risk, with 85% reporting they are concerned about regulatory non-compliance in AI environments. While much of the AI-specific regulation is in its infancy, GDPR requires that personal data used in AI systems be processed lawfully and transparently, and the U.S. has a range of laws that apply at the state level.
See also: AI Executive Order: White House Releases 90-Day Progress Report
The EU AI law, which came into force in August, lays out strict rules on the use of AI in facial recognition and the protection of general-purpose AI systems. Companies that don’t comply with the law face fines ranging from €35 million (US$38 million) or 7% of global turnover to €7.5 million (US$8.1 million) or 1.5% of turnover, depending on the violation and the size of the company. We can expect to see more similar AI-specific regulations in other jurisdictions in the near future.
Other concerns about sensitive data in AI environments cited by more than 80% of Perforce survey respondents included using low-quality data as input to AI models, re-identification of personal data, and theft of model training data that may contain IP or trade secrets.
Businesses are concerned about the economic cost of insecure data
Another major reason why large companies are so concerned about data security is the potential for heavy fines for non-compliance with regulations. Consumer data is widely subject to a growing number of regulations such as GDPR and HIPAA, which can be confusing and change frequently.
Many regulations, such as GDPR, apply penalties based on annual revenue, meaning larger companies face larger fines. Perforce’s report found that 43% of respondents have already had to pay or adjust for non-compliance, and 52% have experienced audit issues or failures related to non-production data.
But the costs of a data breach can go beyond fines, as some of the revenue lost comes from business interruptions: A recent Splunk report found that the biggest cause of downtime incidents was cybersecurity-related human error, such as clicking on a phishing link.
Unplanned downtime costs the world’s largest businesses $400 billion annually, including direct revenue loss, diminished shareholder value, stalled productivity, and damaged reputations. In fact, ransomware damage costs are predicted to exceed $265 billion by 2031.
According to IBM, the average cost of a data breach in 2024 will be $4.88 million, up 10% from 2023. The tech giant’s report adds that 40% of breaches involved data stored in multiple environments, including public cloud and on-premise, cost an average of more than $5 million and took the longest time to identify and contain, proving that business leaders are right to be concerned about data sprawl.
See also: Nearly 10 billion passwords leaked in biggest breach in history
Taking steps to protect data in non-production environments can be resource-intensive
There are ways to protect data stored in non-production environments, such as masking sensitive data, but the Perforce report found that there are several reasons why companies are reluctant to do so, including respondents finding it difficult and time-consuming, and that it could slow their organizations down.
- Nearly a third are concerned that safely replicating production databases to non-production environments could take weeks, slowing down software development.
- 36% of respondents said that masked data is unrealistic and may affect software quality.
- 38% believe security protocols could hinder companies’ ability to track and comply with regulations.
The report also found that 86% of organizations allow data compliance exceptions in non-production environments to reduce the effort of storing data securely, including using limited data sets, data minimization, and obtaining consent from data subjects.
Recommendations for protecting sensitive data in non-production environments
The Perforce team outlined four main ways companies can protect sensitive data in non-production environments:
- Static Data Masking: Permanently replace sensitive values with fictitious but realistic values.
- Data Loss Prevention (DLP)A perimeter defense security approach that detects and attempts to prevent potential data breaches and theft.
- Data Encryption: Temporarily converts data into code so that only authorized users can access the data.
- Strict Access Control: Policies that categorize users by roles or other attributes and configure their access to datasets based on these categories.
“Protecting sensitive data in general is not easy, and AI/ML adds complexity,” the authors write.
“Tools that are specialized in protecting sensitive data in other non-production environments, such as development, testing, and analytics, can help protect AI environments.”