GroupDocs.Parser allows you to extract data from password-protected documents including encrypted PDFs and password-protected Office files.
To load a password-protected document, follow these steps:
Create a LoadOptions object and specify the document password
Instantiate the Parser object with the file path and LoadOptions object
Use extraction methods as usual
Load password-protected document
The following code snippet shows how to load a password-protected document:
fromgroupdocs.parserimportParserfromgroupdocs.parser.optionsimportLoadOptions# Document passwordpassword="your-password"# Create LoadOptions with the passwordload_options=LoadOptions(password)# Create an instance of Parser class with the file path and load optionswithParser("protected_document.pdf",load_options)asparser:# Extract text from the documenttext_reader=parser.get_text()iftext_readerisnotNone:# Print the extracted textprint(text_reader)else:print("Text extraction isn't supported for this format")
You should handle cases when an incorrect password is provided:
fromgroupdocs.parserimportParserfromgroupdocs.parser.optionsimportLoadOptionsdefload_protected_document(file_path,password):"""Load a password-protected document with error handling"""try:# Create LoadOptions with passwordload_options=LoadOptions(password)# Create Parser instancewithParser(file_path,load_options)asparser:# Get document info to verify successful loadingdoc_info=parser.get_document_info()print(f"Document loaded successfully!")print(f"Type: {doc_info.file_type.file_format}")print(f"Pages: {doc_info.page_count}")# Extract texttext_reader=parser.get_text()iftext_reader:returntext_readerexceptExceptionase:print(f"Error loading document: {e}")returnNone# Try loading with passwordtext=load_protected_document("protected.docx","correct-password")
The following sample file is used in this example: protected.docx
Check if document is password-protected
Before attempting to open a document, you can check if it requires a password:
fromgroupdocs.parserimportParserfromgroupdocs.parser.optionsimportLoadOptionsdefcheck_if_protected(file_path):"""Check if a document is password-protected"""try:# Try to get file info without passwordfile_info=Parser.get_file_info(file_path)# Check if the document is encryptedifhasattr(file_info,'is_encrypted')andfile_info.is_encrypted:print("Document is password-protected")returnTrueelse:print("Document is not password-protected")returnFalseexceptExceptionase:print(f"Error checking document: {e}")returnNone# Check if document is protectedis_protected=check_if_protected("sample.pdf")ifis_protected:password=input("Enter password: ")load_options=LoadOptions(password)withParser("./sample.pdf",load_options)asparser:text_reader=parser.get_text()iftext_reader:print(text_reader)else:withParser("./sample.pdf")asparser:text_reader=parser.get_text()iftext_reader:print(text_reader)
The following sample file is used in this example: sample.pdf
The following sample file is used in this example: sample.pdf
Load different protected formats
fromgroupdocs.parserimportParserfromgroupdocs.parser.optionsimportLoadOptionsdefextract_from_protected(file_path,password,extract_type="text"):"""Extract data from various password-protected formats"""try:# Create LoadOptions with passwordload_options=LoadOptions(password)# Create Parser instancewithParser(file_path,load_options)asparser:ifextract_type=="text":# Extract texttext_reader=parser.get_text()iftext_reader:returntext_readerelifextract_type=="metadata":# Extract metadatametadata=parser.get_metadata()ifmetadata:result={}foriteminmetadata:result[item.name]=item.valuereturnresultelifextract_type=="images":# Extract imagesimages=parser.get_images()ifimages:returnlist(images)exceptExceptionase:print(f"Error: {e}")returnNone# Extract text from protected PDFtext=extract_from_protected("protected.pdf","password123","text")
The following sample file is used in this example: protected.pdf
Batch process protected documents
fromgroupdocs.parserimportParserfromgroupdocs.parser.optionsimportLoadOptionsimportosdefprocess_protected_documents(directory,password):"""Process all password-protected documents in a directory"""results=[]forfilenameinos.listdir(directory):file_path=os.path.join(directory,filename)ifnotos.path.isfile(file_path):continuetry:print(f"Processing:{filename}")# Create LoadOptions with passwordload_options=LoadOptions(password)# Try to parse the documentwithParser(file_path,load_options)asparser:# Get document infodoc_info=parser.get_document_info()# Extract texttext_reader=parser.get_text()text=text_readeriftext_readerelse""results.append({"filename":filename,"type":doc_info.file_type.file_format,"pages":doc_info.page_count,"text_length":len(text),"success":True})print(f" ✓ Success - {len(text)} characters extracted")exceptExceptionase:print(f" ✗ Failed - {e}")results.append({"filename":filename,"success":False,"error":str(e)})returnresults# Process all protected documentsresults=process_protected_documents("./protected_docs","common-password")# Print summarysuccessful=[rforrinresultsifr.get("success")]print(f"{len(successful)}/{len(results)}documentsprocessedsuccessfully")
Supported encryption types
GroupDocs.Parser supports the following encryption types:
Note: Documents with owner passwords (restrictions passwords) that only restrict printing/editing are not supported. Only documents with user passwords (opening passwords) can be processed.
More resources
GitHub examples
You may find more code examples in our GitHub repository:
Along with the full-featured library, we provide a free online document parser app. You are welcome to extract data from PDF, DOCX, XLSX, and more with our Free Online Document Parser App.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.