When the document file is located on the local disk, GroupDocs.Parser allows you to load it using the Parser class constructor by specifying an absolute or relative path.
The following code snippet shows how to load a document from a local disk:
Load document by file path
fromgroupdocs.parserimportParser# Specify the file path (absolute or relative)file_path="sample.docx"# Create an instance of Parser class with the file pathwithParser(file_path)asparser:# Extract text from the documenttext_reader=parser.get_text()iftext_readerisnotNone:# Print the extracted textprint(text_reader)else:print("Text extraction isn't supported for this format")
The following sample file is used in this example: sample.docx
Load document with absolute path
fromgroupdocs.parserimportParser# Specify an absolute file pathfile_path="sample.pdf"# Create an instance of Parser classwithParser(file_path)asparser:# Get document infodoc_info=parser.get_document_info()print(f"File type: {doc_info.file_type.file_format}")print(f"Page count: {doc_info.page_count}")print(f"File size: {doc_info.size} bytes")
The following sample file is used in this example: sample.pdf
Batch processing multiple files
fromgroupdocs.parserimportParserimportosdefprocess_files_from_directory(directory_path):"""Process all supported documents in a directory"""# Get all files in the directoryforfilenameinos.listdir(directory_path):file_path=os.path.join(directory_path,filename)# Skip if not a fileifnotos.path.isfile(file_path):continuetry:print(f"Processing:{filename}")# Create parser instancewithParser(file_path)asparser:# Get document infodoc_info=parser.get_document_info()print(f" Type: {doc_info.file_type.file_format}")print(f" Pages: {doc_info.page_count}")# Extract texttext_reader=parser.get_text()iftext_reader:text=text_readerprint(f" Characters: {len(text)}")exceptExceptionase:print(f" Error: {e}")
Error handling
It’s good practice to handle potential errors when loading files:
fromgroupdocs.parserimportParserimportosdefsafe_load_document(file_path):"""Safely load a document with error handling"""# Check if file existsifnotos.path.exists(file_path):print(f"Error: File not found - {file_path}")returnNonetry:# Create parser instanceparser=Parser(file_path)print(f"Document loaded successfully: {file_path}")returnparserexceptExceptionase:print(f"Error loading document: {e}")returnNone# Try to load a documentparser=safe_load_document("sample.pdf")ifparser:try:# Extract datatext_reader=parser.get_text()iftext_reader:print(text_reader)
The following sample file is used in this example: sample.pdf
More resources
GitHub examples
You may find more code examples in our GitHub repository:
Along with the full-featured library, we provide a free online document parser app. You are welcome to extract data from PDF, DOCX, XLSX, and more with our Free Online Document Parser App.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.