Working with Text Leave feedback

Search for text

Python

from groupdocs.parser import Parser

with Parser("sample.pdf") as parser:
    results = parser.search("GroupDocs")
    if results is None:
        print("Search isn't supported for this format.")
    else:
        for result in results:
            print(f"Found on page {result.page.index + 1}: {result.text}")

sample.pdf

The following sample file is used in this example: sample.pdf

Notes

parser.search() returns occurrences with page indexes and matched text.
For regex or case-sensitive searches, use search options from groupdocs.parser.options (see the API reference for available parameters).
Combine search with get_text() or get_text() per page to extract surrounding content.

Extract formatted or structured text

GroupDocs.Parser can return formatted text (HTML/Markdown/plain) and structured text (headings, lists, tables) when supported by the format.

Python

from groupdocs.parser import Parser

with Parser("sample.pptx") as parser:
    # Use format-specific text extraction methods exposed in the API reference
    reader = parser.get_text()
    print(reader if reader else "Formatted text isn't available.")

sample.pptx

The following sample file is used in this example: sample.pptx

See the API reference for the formatted and structured text methods supported in Python via .NET.

We value your opinion. Your feedback will help us improve our documentation.

Working with Text Leave feedback

On this page

Search for text

Notes

Extract formatted or structured text

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!

On this page