Working with Text

Beyond basic text extraction, you can search inside documents and control how text is returned. This page shows common tasks; consult the API reference for the full list of text-related methods and options.

Search for text

from groupdocs.parser import Parser

with Parser("sample.pdf") as parser:
    results = parser.search("GroupDocs")
    if results is None:
        print("Search isn't supported for this format.")
    else:
        for result in results:
            print(f"Found on page {result.page.index + 1}: {result.text}")

The following sample file is used in this example: sample.pdf

Notes

  • parser.search() returns occurrences with page indexes and matched text.
  • For regex or case-sensitive searches, use search options from groupdocs.parser.options (see the API reference for available parameters).
  • Combine search with get_text() or get_text() per page to extract surrounding content.

Extract formatted or structured text

GroupDocs.Parser can return formatted text (HTML/Markdown/plain) and structured text (headings, lists, tables) when supported by the format.

from groupdocs.parser import Parser

with Parser("sample.pptx") as parser:
    # Use format-specific text extraction methods exposed in the API reference
    reader = parser.get_text()
    print(reader if reader else "Formatted text isn't available.")

The following sample file is used in this example: sample.pptx

See the API reference for the formatted and structured text methods supported in Python via .NET.