Beyond basic text extraction, you can search inside documents and control how text is returned. This page shows common tasks; consult the API reference for the full list of text-related methods and options.
Search for text
fromgroupdocs.parserimportParserwithParser("sample.pdf")asparser:results=parser.search("GroupDocs")ifresultsisNone:print("Search isn't supported for this format.")else:forresultinresults:print(f"Found on page {result.page.index+1}: {result.text}")
The following sample file is used in this example: sample.pdf
Notes
parser.search() returns occurrences with page indexes and matched text.
For regex or case-sensitive searches, use search options from groupdocs.parser.options (see the API reference for available parameters).
Combine search with get_text() or get_text() per page to extract surrounding content.
Extract formatted or structured text
GroupDocs.Parser can return formatted text (HTML/Markdown/plain) and structured text (headings, lists, tables) when supported by the format.
fromgroupdocs.parserimportParserwithParser("sample.pptx")asparser:# Use format-specific text extraction methods exposed in the API referencereader=parser.get_text()print(readerifreaderelse"Formatted text isn't available.")
The following sample file is used in this example: sample.pptx
See the API reference for the formatted and structured text methods supported in Python via .NET.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.