Quick Start Guide

This guide demonstrates the essential steps to get started with GroupDocs.Parser for Python via .NET and perform basic document parsing operations.

Prerequisites

Before you begin, ensure you have:

  • Python 3.5 or higher installed
  • GroupDocs.Parser for Python via .NET installed (see Installation)

Extract Text from a Document

The most common task is extracting text from documents. Here’s how to do it:

from groupdocs.parser import Parser

def extract_text_from_document():
    # Create an instance of Parser class
    with Parser("./sample.pdf") as parser:
        # Extract text from the document
        text_reader = parser.get_text()
        
        if text_reader is not None:
            # Print the extracted text
            print(text_reader)
        else:
            print("Text extraction isn't supported for this format")

if __name__ == "__main__":
    extract_text_from_document()

The following sample file is used in this example: sample.pdf

Get Document Information

You can retrieve basic information about a document:

from groupdocs.parser import Parser

def get_document_information():
    # Create an instance of Parser class
    with Parser("./sample.pdf") as parser:
        # Get document info
        info = parser.get_document_info()
        
        print(f"File type: {info.file_type.file_format}")
        print(f"Page count: {info.page_count}")
        print(f"Size: {info.size} bytes")

if __name__ == "__main__":
    get_document_information()

The following sample file is used in this example: sample.pdf

Extract Metadata

Extract metadata properties from documents:

from groupdocs.parser import Parser

def extract_metadata():
    # Create an instance of Parser class
    with Parser("./sample.pdf") as parser:
        # Extract metadata
        metadata = parser.get_metadata()
        
        if metadata is not None:
            for item in metadata:
                print(f"{item.name}: {item.value}")

if __name__ == "__main__":
    extract_metadata()

The following sample file is used in this example: sample.pdf

Extract Images

Extract images from documents:

from groupdocs.parser import Parser

def extract_images():
    # Create an instance of Parser class
    with Parser("./sample.pdf") as parser:
        # Extract images
        images = parser.get_images()
        
        if images is not None:
            for i, image in enumerate(images):
                # Save image to file
                with open(f"image_{i}.{image.file_type.extension}", "wb") as file:
                    file.write(image.get_image_stream().read())

if __name__ == "__main__":
    extract_images()

The following sample file is used in this example: sample.pdf

Extract Text from Specific Page

Extract text from a particular page:

from groupdocs.parser import Parser

def extract_text_from_specific_page():
    # Create an instance of Parser class
    with Parser("./sample.pdf") as parser:
        # Get document info to check page count
        info = parser.get_document_info()
        
        if info.page_count > 0:
            # Extract text from the first page (page index is 0-based)
            text_reader = parser.get_text(0)
            
            if text_reader is not None:
                print(text_reader)

if __name__ == "__main__":
    extract_text_from_specific_page()

The following sample file is used in this example: sample.pdf

Check Format Support

Before processing a document, you can check if the format is supported:

from groupdocs.parser import Parser

def check_format_support():
    # Check if file format is supported
    if Parser.get_file_info("./sample.pdf").file_type.file_format != "Unknown":
        print("Format is supported")
        
        # Process the document
        with Parser("./sample.pdf") as parser:
            text_reader = parser.get_text()
            if text_reader is not None:
                print(text_reader)
    else:
        print("Format is not supported")

if __name__ == "__main__":
    check_format_support()

The following sample file is used in this example: sample.pdf

Next Steps

Now that you’ve learned the basics, explore more advanced features:

Additional Resources