GroupDocs.Redaction for Python via .NET Overview
Leave feedback
On this page
What is GroupDocs.Redaction?
GroupDocs.Redaction for Python via .NET is a native Python library that permanently removes or obscures sensitive content from documents — across PDF, Microsoft Word, Excel, PowerPoint, and image formats — through a single, format-independent API. It runs entirely on-premise, requires no Microsoft Office or Adobe Acrobat installation, and ships as a pre-built wheel on Windows, Linux, and macOS.
Typical uses include:
PII / PHI removal — strip names, SSNs, emails, and other personal data from a document before it is shared, published, or archived (GDPR, HIPAA, CCPA).
Legal & e-discovery redaction — black out privileged phrases and annotations across every page of a production set.
Metadata sanitization — erase or rewrite author, company, and other hidden metadata that leaks information.
Irreversible redaction — rasterize the cleaned document to a PDF so the removed content can never be recovered.
Policy-driven batch redaction — define a reusable set of redaction rules once and apply it across many documents in a pipeline.
Key Capabilities
Capability
Description
Text Redaction
Replace or black out text matched by an exact phrase (case-sensitive or RTL-aware) or a regular expression. See Text Redactions.
Metadata Redaction
Erase metadata wholesale or by filter, or rewrite values that match a pattern. See Metadata Redactions.
Image Redaction
Black out a rectangular area of an image or scanned page and clean embedded image metadata. See Image Redactions.
Annotation Redaction
Rewrite or delete annotations, comments, and notes by pattern. See Annotation Redactions.
Save in the original format, or rasterize to a PDF (optionally PDF/A) so redactions are irreversible. See Saving Documents.
Redaction Policies
Bundle several redactions into a reusable policy and apply it across many documents. See Use Redaction Policies.
Document Inspection
Read file type, page count, and size without modifying the document. See Get File Info.
Quick Example
Redact every occurrence of a phrase and save the result with just a few lines of code. The example rasterizes the result to a PDF named sample_redacted.pdf, so the removed content cannot be recovered:
fromgroupdocs.redactionimportRedactorfromgroupdocs.redaction.optionsimportSaveOptionsfromgroupdocs.redaction.redactionsimportExactPhraseRedaction,ReplacementOptionsdefredact_text():# Open the documentwithRedactor("./sample.docx")asredactor:# Replace every occurrence of "John Doe" with "[personal]"redactor.apply(ExactPhraseRedaction("John Doe",ReplacementOptions("[personal]")))# Rasterize the result to a PDF named sample_redacted.pdfsave_options=SaveOptions()save_options.add_suffix=Truesave_options.rasterize_to_pdf=Truesave_options.redacted_file_suffix="redacted"redactor.save(save_options)if__name__=="__main__":redact_text()
sample.docx is the sample file used in this example. Click here to download it.
For finer control, apply several redactions and keep the original format with SaveOptions(rasterize_to_pdf=False):
fromgroupdocs.redactionimportRedactorfromgroupdocs.redaction.redactionsimportExactPhraseRedaction,RegexRedaction,ReplacementOptionsfromgroupdocs.redaction.optionsimportSaveOptionsdefredact_with_options():withRedactor("./sample.docx")asredactor:# Redact a name and any 2+ digit number sequencesredactor.apply(ExactPhraseRedaction("John Doe",ReplacementOptions("[personal]")))redactor.apply(RegexRedaction(r"\d{2,}",ReplacementOptions("[number]")))# Keep the original DOCX format instead of rasterizing to PDFoptions=SaveOptions()options.add_suffix=Trueoptions.rasterize_to_pdf=Falseoptions.redacted_file_suffix="redacted"redactor.save(options)if__name__=="__main__":redact_with_options()
sample.docx is the sample file used in this example. Click here to download it.