Get Document Information

GroupDocs.Comparison for Python via .NET can return the following information about a document without performing a comparison:

  • file_type — the document file type (PDF, Word, Excel, PowerPoint, image, etc.).
  • page_count — number of pages.
  • size — file size in bytes.
  • pages_info — per-page information.

Example 1: Get document info for a file on local disk

from groupdocs.comparison import Comparer

def get_document_info():
    with Comparer("./source.docx") as comparer:
        info = comparer.source.get_document_info()
        print(f"File type: {info.file_type.file_format}")
        print(f"Number of pages: {info.page_count}")
        print(f"Document size: {info.size} bytes")
        print("\nDocument info extracted successfully.")

if __name__ == "__main__":
    get_document_info()

source.docx is the document used in this example. Click here to download it.

File type: Microsoft Word Document
Number of pages: 1
Document size: 26611 bytes

Document info extracted successfully.

Download full output

Example 2: Get document info for a file from a stream

from groupdocs.comparison import Comparer

def get_document_info_from_stream():
    with open("./source.docx", "rb") as source_stream:
        with Comparer(source_stream) as comparer:
            info = comparer.source.get_document_info()
            print(f"File type: {info.file_type.file_format}")
            print(f"Number of pages: {info.page_count}")
            print(f"Document size: {info.size} bytes")

if __name__ == "__main__":
    get_document_info_from_stream()

source.docx is the document used in this example. Click here to download it.

File type: Microsoft Word Document
Number of pages: 1
Document size: 26611 bytes

Download full output

Inspect without comparing — the trio

Comparer.source.get_document_info() is one of three ways to read information about a document without running a full comparison:

What you needUse
File type, size, page count of a specific documentThis page
The list of every supported format at runtimeGet supported file formats
Visual thumbnails of selected pagesGenerate document pages preview

Pair get_document_info() with the format-list to validate inputs early in a pipeline (e.g., reject anything outside an allowlist), and use page previews to give end users a quick visual preview before committing to a full diff.

Close
Loading

Analyzing your prompt, please hold on...

An error occurred while retrieving the results. Please refresh the page and try again.