Automatic discovery, classification, and de-identification of sensitive data in the cloud.
What is Sensitive Data Protection?
Sensitive Data Protection, formerly known as the DLP API, is Google’s service for automatic discovery and protection of sensitive data. The service scans structured and unstructured data in Cloud Storage, BigQuery, and other sources to identify personally identifiable information (PII), financial data, and other confidential content. With over 150 predefined detectors, the service recognizes patterns such as credit card numbers, social security numbers, email addresses, and country-specific identifiers.
The de-identification engine offers various techniques for protecting discovered data. Masking replaces sensitive content with placeholders, tokenization converts data into non-reversible tokens, and format-preserving encryption maintains the data format while protecting the content. These methods enable using data for analytics and development without exposing sensitive information.
Discovery scans automate continuous monitoring of entire projects or organizations. The service creates data profiles showing where sensitive data is stored, what types are present, and what risk exists. This transparency is the foundation for compliance audits and data protection strategies.
Core Features
- Data Discovery: Automated scans across Cloud Storage, BigQuery, and Datastore to detect sensitive data across entire organizations
- Classification: Over 150 predefined detectors for PII, financial data, health data, and country-specific identifiers
- De-identification: Masking, tokenization, pseudonymization, and format-preserving encryption for protecting sensitive data
- Risk Assessment: Data profiling with risk scores and recommendations for risk mitigation
Typical Use Cases
GDPR Compliance and Data Protection
Organizations use Sensitive Data Protection to identify and protect personal data across their cloud environments. Automated scans ensure PII is correctly classified and handled according to GDPR requirements. De-identification enables using data for analytics without violating privacy regulations.
Secure Data Sharing for Development and Analytics
Development and analytics teams need realistic test data that contains no real personal information. Sensitive Data Protection automatically de-identifies production data so teams can work with structurally correct but anonymized data.
Benefits
- Automatic detection of sensitive data without manual classification
- Over 150 predefined detectors with high detection rates
- Flexible de-identification methods for various use cases
- Seamless integration with BigQuery, Cloud Storage, and Datastore
Integration with innFactory
As a Google Cloud partner, innFactory supports you with Sensitive Data Protection: setting up discovery scans, configuring de-identification pipelines, GDPR compliance strategies, and integration into existing data architectures.
Typical Use Cases
Frequently Asked Questions
What is Sensitive Data Protection?
Sensitive Data Protection (formerly DLP API) is a Google Cloud service that automatically discovers, classifies, and protects sensitive data such as credit card numbers, social security numbers, and personal information. The service supports over 150 predefined detectors.
How does de-identification work?
Sensitive Data Protection offers various de-identification techniques including masking, tokenization, pseudonymization, and format-preserving encryption. These methods protect data while keeping it usable for analytics.
Which data sources are supported?
The service scans Cloud Storage, BigQuery, Datastore, and any text data via the API. Discovery scans can be automated across entire projects or organizations.
