GroupDocs.Redaction for Python via .NET Overview

What is GroupDocs.Redaction?

GroupDocs.Redaction for Python via .NET is a native Python library that permanently removes or obscures sensitive content from documents — across PDF, Microsoft Word, Excel, PowerPoint, and image formats — through a single, format-independent API. It runs entirely on-premise, requires no Microsoft Office or Adobe Acrobat installation, and ships as a pre-built wheel on Windows, Linux, and macOS.

Typical uses include:

  • PII / PHI removal — strip names, SSNs, emails, and other personal data from a document before it is shared, published, or archived (GDPR, HIPAA, CCPA).
  • Legal & e-discovery redaction — black out privileged phrases and annotations across every page of a production set.
  • Metadata sanitization — erase or rewrite author, company, and other hidden metadata that leaks information.
  • Irreversible redaction — rasterize the cleaned document to a PDF so the removed content can never be recovered.
  • Policy-driven batch redaction — define a reusable set of redaction rules once and apply it across many documents in a pipeline.

Key Capabilities

CapabilityDescription
Text RedactionReplace or black out text matched by an exact phrase (case-sensitive or RTL-aware) or a regular expression. See Text Redactions.
Metadata RedactionErase metadata wholesale or by filter, or rewrite values that match a pattern. See Metadata Redactions.
Image RedactionBlack out a rectangular area of an image or scanned page and clean embedded image metadata. See Image Redactions.
Annotation RedactionRewrite or delete annotations, comments, and notes by pattern. See Annotation Redactions.
Page RemovalRemove whole pages, slides, or worksheets from a document. See Remove Page Redactions.
Rasterization & SavingSave in the original format, or rasterize to a PDF (optionally PDF/A) so redactions are irreversible. See Saving Documents.
Redaction PoliciesBundle several redactions into a reusable policy and apply it across many documents. See Use Redaction Policies.
Document InspectionRead file type, page count, and size without modifying the document. See Get File Info.

Quick Example

Redact every occurrence of a phrase and save the result with just a few lines of code. The example rasterizes the result to a PDF named sample_redacted.pdf, so the removed content cannot be recovered:

from groupdocs.redaction import Redactor
from groupdocs.redaction.options import SaveOptions
from groupdocs.redaction.redactions import ExactPhraseRedaction, ReplacementOptions

def redact_text():
    # Open the document
    with Redactor("./sample.docx") as redactor:
        # Replace every occurrence of "John Doe" with "[personal]"
        redactor.apply(ExactPhraseRedaction("John Doe", ReplacementOptions("[personal]")))
        # Rasterize the result to a PDF named sample_redacted.pdf
        save_options = SaveOptions()
        save_options.add_suffix = True
        save_options.rasterize_to_pdf = True
        save_options.redacted_file_suffix = "redacted"
        redactor.save(save_options)

if __name__ == "__main__":
    redact_text()

sample.docx is the sample file used in this example. Click here to download it.

Binary file (PDF, 1.0 MB)

Download full output

For finer control, apply several redactions and keep the original format with SaveOptions(rasterize_to_pdf=False):

from groupdocs.redaction import Redactor
from groupdocs.redaction.redactions import ExactPhraseRedaction, RegexRedaction, ReplacementOptions
from groupdocs.redaction.options import SaveOptions

def redact_with_options():
    with Redactor("./sample.docx") as redactor:
        # Redact a name and any 2+ digit number sequences
        redactor.apply(ExactPhraseRedaction("John Doe", ReplacementOptions("[personal]")))
        redactor.apply(RegexRedaction(r"\d{2,}", ReplacementOptions("[number]")))

        # Keep the original DOCX format instead of rasterizing to PDF
        options = SaveOptions()
        options.add_suffix = True
        options.rasterize_to_pdf = False
        options.redacted_file_suffix = "redacted"
        redactor.save(options)

if __name__ == "__main__":
    redact_with_options()

sample.docx is the sample file used in this example. Click here to download it.

Binary file (DOCX, 16 KB)

Download full output

Where to next

  1. Install the packageInstallation walks through PyPI and offline wheel installation for Windows, Linux, and macOS.
  2. Run your first redactionHello, World! redacts a document in under five minutes.
  3. Explore runnable examplesHow to Run Examples clones the GitHub repository and runs every documented scenario locally or in Docker.
  4. Use it in depth — the Developer Guide covers every API surface with runnable, copy-paste code examples.
  5. Plug it into AI pipelinesAI Agents & LLM Integration explains the bundled AGENTS.md, the MCP server, and machine-readable docs.