Microsoft Presidio
Detect and de-identify PII across text, images, and structured data
About
Drop Presidio into your data pipeline to detect and anonymize PII in text, images, and tabular records. Data engineers and compliance teams use it to scrub logs, support tickets, and user uploads before storage or analytics. Its modular Python components—Analyzer, Anonymizer, Image-Redactor, and Structured—combine pattern matching with NLP and allow custom recognizers.
Editor's Take
We recommend Presidio if your team needs repo- or pipeline-level PII detection and flexible anonymization operators that run locally; best suited for engineers who can integrate Python SDKs and tune recognizers.
Key Features
- Send raw text to Presidio Analyzer → identifies entities like credit cards, SSNs, names using NLP and pattern matching
- Pipe results into Presidio Anonymizer → masks, hashes, or replaces detected PII per policy
- Upload an image through Presidio Image-Redactor → locates and redacts sensitive text or regions in the image
- Provide CSV/JSON to Presidio Structured → flags and anonymizes PII in columns and nested fields
- Install Python packages (latest 2.2.362) → run the SDK locally in Python with pluggable recognizers and operators
Use Cases
- A data engineer scrubbing application logs for PII before shipping them to a centralized SIEM
- A healthcare analyst de-identifying PHI in clinical notes prior to model training
- A trust-and-safety team redacting sensitive text from user-uploaded screenshots
Try It Like This
- 1 Detect PII in application logs
Install presidio-analyzer and presidio-anonymizer via pip → point the Analyzer at raw log text to extract entities like credit cards, SSNs, and names using built-in NLP and regex recognizers → pipe Analyzer results into the Anonymizer to mask or hash detected PII before sending logs to your SIEM.
- 2 De-identify clinical notes for model training
Add Presidio Python packages to your ETL environment and load the Analyzer and Anonymizer modules → run clinical text through Analyzer to locate PHI (names, dates, identifiers) and apply policy-driven anonymization (replace or redact) → export de-identified notes for downstream model training while preserving a reproducible anonymization policy.
- 3 Redact sensitive regions in user-uploaded screenshots
Install and enable the Image-Redactor component in your service that accepts uploads → upload the screenshot to Image-Redactor which locates and removes sensitive text or image regions → store the redacted image and keep audit metadata from the Analyzer/Redactor for compliance.
- 4 Anonymize PII in CSV/JSON records
Install presidio-structured and provide your CSV or nested JSON to the Structured module → configure column/field rules and run detection to flag PII in columns and nested objects → apply anonymization operators (mask, hash, replace) to produce scrubbed data ready for analytics.
- 5 Add a custom recognizer for an internal ID format
Write a pluggable recognizer in Python that defines patterns or ML heuristics for your internal ID → register the recognizer with the Analyzer so it becomes part of detection passes → verify detection on sample data and route hits to the Anonymizer with the policy you choose (e.g., pseudonymize or hash).
Pros & Cons
Pros
- Modular Python SDK with Analyzer, Anonymizer, Image-Redactor, and Structured components that can be dropped into existing pipelines.
- Supports text, images, and structured (CSV/JSON) data, enabling PII detection and redaction across common data types used by engineers and analysts.
- Pluggable recognizers and operators let teams add custom patterns or ML-based recognizers and define per-entity anonymization policies (mask, hash, replace).
Cons
- Requires Python integration and configuration in your pipeline—expect development effort to register recognizers and tune detection rules.
- No evidence of Korean-language support was found in available sources.
Getting Started
- 1 Install the Python packages (e.g., pip install presidio-analyzer presidio-anonymizer) and open the documentation at microsoft.github.io/presidio.
- 2 Run the Analyzer on a sample text and configure recognizers and anonymization operators.
- 3 Execute the anonymization step and see credit cards, SSNs, and names replaced or masked within minutes.
Similar Tools
FAQ
What platforms is Microsoft Presidio available on?
Available on API, Web, Desktop.
Does Microsoft Presidio support Korean?
Korean is not currently supported.