How to Run Examples

The complete project GroupDocs.Parser Examples with code examples and sample files is hosted on GitHub.

Run examples using PyPI

To get started make sure that Python is installed (version 3.5 or higher).

  1. Clone repository with examples:

    git clone https://github.com/groupdocs-parser/GroupDocs.Parser-for-Python-via-.NET.git
    
  2. Navigate to the project folder:

    cd ./GroupDocs.Parser-for-Python-via-.NET
    
  3. Install the necessary packages:

    pip install groupdocs-parser-net
    
  4. Run the examples:

    python run_examples.py
    

To check what examples are available, open the run_examples.py file in your favorite text editor. Uncomment examples you want to run and type python run_examples.py to start them.

Build project from scratch

If you prefer to create a project from scratch, follow these steps:

Step 1: Install GroupDocs.Parser

Install GroupDocs.Parser for Python via .NET using pip:

pip install groupdocs-parser-net

Step 2: Create a Python Script

Create a new Python file (e.g., example.py) and add the following code:

from groupdocs.parser import Parser

def extract_text_from_document():
    # Create an instance of Parser class
    with Parser("./sample.docx") as parser:
        # Extract text from the document
        text_reader = parser.get_text()
        
        if text_reader is not None:
            # Print the extracted text
            extracted_text = text_reader
            print(extracted_text)
        else:
            print("Text extraction isn't supported for this format")

if __name__ == "__main__":
    extract_text_from_document()

The following sample file is used in this example: sample.docx

Step 3: Run the Script

Execute your Python script:

python example.py

The extracted text will appear in the console.

Common Examples

Extract Text from PDF

from groupdocs.parser import Parser

def extract_text_from_pdf():
    with Parser("./sample.pdf") as parser:
        text_reader = parser.get_text()
        if text_reader:
            print(text_reader)

if __name__ == "__main__":
    extract_text_from_pdf()

The following sample file is used in this example: sample.pdf

Extract Metadata

from groupdocs.parser import Parser

def extract_metadata_example():
    with Parser("./sample.docx") as parser:
        metadata = parser.get_metadata()
        if metadata:
            for item in metadata:
                print(f"{item.name}: {item.value}")

if __name__ == "__main__":
    extract_metadata_example()

The following sample file is used in this example: sample.docx

Extract Images

from groupdocs.parser import Parser

def extract_images_example():
    with Parser("./sample.pdf") as parser:
        images = parser.get_images()
        if images:
            for i, image in enumerate(images):
                with open(f"image_{i}.png", "wb") as file:
                    image_stream = image.get_image_stream()
                    file.write(image_stream.read())

if __name__ == "__main__":
    extract_images_example()

The following sample file is used in this example: sample.pdf

Contribute

If you like to add or improve an example, we encourage you to contribute to the project. All examples in this repository are open-source and can be freely used in your own applications.

To contribute, you can fork the repository, edit the source code and create a pull request. We will review the changes and include them in the repository if found helpful.