With GroupDocs.Redaction API you can apply metadata redactions for documents of different formats like PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and others. See full list at supported document formats article.
GroupDocs.Redactions provides a flexible API that allows to replace or remove metadata using filters or search by regular expression.
Filter metadata
Base functionality for all redactions, derived from MetadataRedaction base class is metadata filtering and it is mandatory for metadata redactions. It uses flagged enumeration MetadataFilters, containing items for most frequent metadata entries. You can set the filter to All, or any combination of metadata. For instance, the example below sets filter to Author, Manager and NameOfApplication - for textual redaction or cleaning them out:
# redaction derived from MetadataRedactionredaction.filter=MetadataFilters.AUTHOR|MetadataFilters.MANAGER|MetadataFilters.NAME_OF_APPLICATION
Name of application where the document was created
Manager
4096
Author’s manager name
RevisionNumber
8192
Revision number
Subject
16384
Subject of the document
Template
32768
Document template name
Title
65536
Document title
TotalEditingTime
131072
Total editing time
Version
262144
Document’s version
Description
524288
Document’s description
Keywords
1048576
Document’s keywords
ContentType
2097152
Content type
All
2147483647
All types of the metadata items
Clean metadata
You can replace all or specific metadata in the document with empty (blank or minimal) values using EraseMetadataRedaction class. The example below blanks out all properties of the document:
fromgroupdocs.redactionimportRedactorfromgroupdocs.redaction.optionsimportSaveOptionsfromgroupdocs.redaction.redactionsimportEraseMetadataRedaction,MetadataFiltersdefclean_all_metadata():# Specify the redaction options to erase all metadatamet_red=EraseMetadataRedaction(MetadataFilters.ALL)# Load the document to be redactedwithRedactor("./sample.docx")asredactor:# Apply the redactionresult=redactor.apply(met_red)# Save the redacted document next to the source fileso=SaveOptions()so.add_suffix=Trueso.rasterize_to_pdf=Falseso.redacted_file_suffix="redacted"redactor.save(so)if__name__=="__main__":clean_all_metadata()
sample.docx is the sample file used in this example. Click here to download it.
You can specify MetadataFilter.All or use default constructor to blank out all metadata within given document, Custom - to clear all custom metadata entries.
Redact metadata
You can use MetadataSearchRedaction to remove sensitive data from document’s metadata using regular expressions. For instance, we can remove any mention of “Company Ltd.”:
fromgroupdocs.redactionimportRedactorfromgroupdocs.redaction.optionsimportSaveOptionsfromgroupdocs.redaction.redactionsimportMetadataSearchRedactiondefredact_metadata():# Specify the redaction options: search pattern and replacement stringmet_red=MetadataSearchRedaction("Company Ltd.","--company--")# Load the document to be redactedwithRedactor("./sample.docx")asredactor:# Apply the redactionresult=redactor.apply(met_red)# Save the redacted document next to the source fileso=SaveOptions()so.add_suffix=Trueso.rasterize_to_pdf=Falseso.redacted_file_suffix="redacted"redactor.save(so)if__name__=="__main__":redact_metadata()
sample.docx is the sample file used in this example. Click here to download it.
First argument is regular expression, second is a replacement string. You can also set scope for redaction by setting filter, e.g. to MetadataFilter.Company. - it will leave the regular expressions matches undone in all metadata items, except “Company” property:
fromgroupdocs.redactionimportRedactorfromgroupdocs.redaction.optionsimportSaveOptionsfromgroupdocs.redaction.redactionsimportMetadataSearchRedaction,MetadataFiltersdefredact_metadata_with_filter():# Specify the redaction options: search pattern and replacement stringmet_red=MetadataSearchRedaction("Company Ltd.","--company--")# Limit the redaction scope to the Company metadata item onlymet_red.filter=MetadataFilters.COMPANY# Load the document to be redactedwithRedactor("./sample.docx")asredactor:# Apply the redactionresult=redactor.apply(met_red)# Save the redacted document next to the source fileso=SaveOptions()so.add_suffix=Trueso.rasterize_to_pdf=Falseso.redacted_file_suffix="redacted"redactor.save(so)if__name__=="__main__":redact_metadata_with_filter()
sample.docx is the sample file used in this example. Click here to download it.
All metadata redactions apply to each metadata item separately, and even if metadata item redaction fails, the rest of the metadata items will be updated. You can find a list of failed, skipped (rejected) metadata items and reasons for that in ErrorMessage property of RedactorLogEntry.Result.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.