Get document info Leave feedback

Get document info from a local file

Python

from groupdocs.parser import Parser

# Create an instance of Parser class
with Parser("./sample.docx") as parser:
    # Get the document info
    doc_info = parser.get_document_info()
    
    # Print document information
    print(f"File type: {doc_info.file_type.file_format}")
    print(f"Page count: {doc_info.page_count}")
    print(f"File size: {doc_info.size} bytes")
    
    # Print file extension
    print(f"File extension: {doc_info.file_type.extension}")

sample.docx

The following sample file is used in this example: sample.docx

Get document info from a stream

Python

from groupdocs.parser import Parser

# Open the file stream
with open("sample.pdf", "rb") as stream:
    # Create an instance of Parser class with the stream
    with Parser(stream) as parser:
        # Get the document info
        doc_info = parser.get_document_info()
        
        # Print document information
        print(f"File type: {doc_info.file_type.file_format}")
        print(f"Page count: {doc_info.page_count}")
        print(f"File size: {doc_info.size} bytes")

sample.pdf

The following sample file is used in this example: sample.pdf

Check document properties before extraction

It’s useful to check document properties before performing extraction operations:

Python

from groupdocs.parser import Parser

def process_document(file_path):
    with Parser(file_path) as parser:
        # Get document information
        doc_info = parser.get_document_info()
        
        print(f"Processing: {file_path}")
        print(f"Type: {doc_info.file_type.file_format}")
        print(f"Pages: {doc_info.page_count}")
        print(f"Size: {doc_info.size / 1024:.2f} KB")
        
        # Process based on page count
        if doc_info.page_count > 0:
            print("Document has pages, proceeding with text extraction...")
            text_reader = parser.get_text()
            if text_reader:
                print(text_reader)
        else:
            print("Document has no pages or page count is not available")

# Process different document types
process_document("sample.pdf")

sample.pdf

The following sample file is used in this example: sample.pdf

Get page-specific information

For multi-page documents, you can also get information about individual pages:

Python

from groupdocs.parser import Parser

with Parser("./sample.docx") as parser:
    # Get document info
    doc_info = parser.get_document_info()
    
    print(f"Total pages: {doc_info.page_count}")
    
    # Iterate through pages
    for page_index in range(doc_info.page_count):
        # Extract text from each page
        print(f"
--- Page {page_index + 1} ---")
        text_reader = parser.get_text(page_index)
        if text_reader:
            page_text = text_reader
            print(f"Characters: {len(page_text)}")

sample.docx

The following sample file is used in this example: sample.docx

Working with unsupported formats

If a document format doesn’t support certain features, the API returns appropriate values:

Python

from groupdocs.parser import Parser

try:
    with Parser("./unknown.format") as parser:
        doc_info = parser.get_document_info()
        
        if doc_info:
            print(f"File type: {doc_info.file_type.file_format}")
            
            # Some formats may not have page count
            if doc_info.page_count == 0:
                print("Page count is not available for this format")
        else:
            print("Could not retrieve document information")
            
except Exception as e:
    print(f"Error: {e}")

More resources

Advanced usage topics

To learn more about document data extraction features and how to extract text, images, metadata, and more, please refer to the advanced usage section.

GitHub examples

You may find more code examples in our GitHub repository:

GroupDocs.Parser for Python via .NET examples

Free online document parser

Along with the full-featured library, we provide a free online document parser app. You are welcome to extract data from PDF, DOCX, XLSX, and more with our Free Online Document Parser App.

We value your opinion. Your feedback will help us improve our documentation.

Get document info Leave feedback

On this page

Get document info from a local file

Get document info from a stream

Check document properties before extraction

Get page-specific information

Working with unsupported formats

More resources

Advanced usage topics

GitHub examples

Free online document parser

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!

On this page