Search text Leave feedback

Prerequisites

GroupDocs.Parser for Python via .NET installed
Sample documents for testing
Basic understanding of regular expressions (for advanced searches)

Search text by keyword

To search for a specific keyword in a document:

Python

from groupdocs.parser import Parser

# Create an instance of Parser class
with Parser("./sample.pdf") as parser:
    # Search for a keyword
    search_results = parser.search("invoice")
    
    # Check if search is supported
    if search_results is None:
        print("Search isn't supported")
    else:
        # Iterate over search results
        for result in search_results:
            # Print position and found text
            print(f"At {result.position}: {result.text}")

sample.pdf

The following sample file is used in this example: sample.pdf

Expected behavior: The method returns a collection of SearchResult objects, each containing the position and text of every occurrence of the keyword.

Search with regular expressions

To search using regular expressions:

Python

from groupdocs.parser import Parser
from groupdocs.parser.options import SearchOptions

# Create an instance of Parser class
with Parser("./sample.pdf") as parser:
    # Create search options for regex search
    # Parameters: match_case, match_whole_word, use_regular_expression
    options = SearchOptions(True, False, True)
    
    # Search with a regular expression (case-sensitive)
    search_results = parser.search(r"page number: \d+", options)
    
    # Check if search is supported
    if search_results is None:
        print("Search isn't supported")
    else:
        # Iterate over search results
        for result in search_results:
            print(f"At {result.position}: {result.text}")

sample.pdf

The following sample file is used in this example: sample.pdf

Expected behavior: Finds all text matching the regular expression pattern with the specified options.

Search with case sensitivity and whole word matching

Python

from groupdocs.parser import Parser
from groupdocs.parser.options import SearchOptions

# Create an instance of Parser class
with Parser("./sample.docx") as parser:
    # Search for exact word match (case-insensitive, whole word)
    # Parameters: match_case=False, match_whole_word=True, use_regular_expression=False
    options = SearchOptions(False, True, False)
    
    search_results = parser.search("invoice", options)
    
    if search_results:
        print(f"Found {len(list(search_results))} occurrences of 'invoice' as whole word")

sample.docx

The following sample file is used in

sample.docx

Search text with highlights com/parser/python-net/search-text/#search-text-with-highlights class="gdoc-page__anchor clip flex align-center" title="Anchor to: Search text with highlights" aria-label="Anchor to: Search text with highlights" href=#search-text-with-highlights>

To search and extract surrounding text (highlights):

class=gdoc-tabs__name>Python

from groupdocs.parser import Parser class=cl>from groupdocs.parser.options import SearchOptions, HighlightOptions class=cl># Create an instance of Parser class class=cl>with Parser("./sample.pdf") as parser: # Create highlight options (extract 15 characters around the match) highlight_options = HighlightOptions(15)

# Create search options with highlights search_options = SearchOptions( match_case=False, match_whole_word=False, use_regular_expression=False, left_highlight_options=highlight_options, right_highlight_options=highlight_options )

# Search with highlights search_results = parser.search("lorem", search_options)

if search_results is None: print("Search isn't supported") else: # Iterate over search results and print with highlights for result in search_results: left_text = result.left_highlight_item.text if result.left_highlight_item else "" right_text = result.right_highlight_item.text if result.right_highlight_item else "" print(f"{left_text}[{result.text}]{right_text}") type=radio class="gdoc-tabs__control hidden" name=tabs-example-4 id=tabs-example-4-1> class=gdoc-tabs__name>sample.pdf

The following sample file is used in this example: sample.pdf

Expected behavior: Returns search results with context from surrounding text on both sides of the match.

`Search text with page numbers`

To search and get page numbers where text appears:


Pythonfrom groupdocs.parser import Parser
from groupdocs.parser.options import SearchOptions

# Create an instance of Parser class
with Parser("./sample.pdf") as parser:
    # Create search options with page search enabled
    # Parameters: match_case, match_whole_word, use_regular_expression, search_by_pages
    options = SearchOptions(False, False, False, True)
    
    # Search with page numbers
    search_results = parser.search("lorem", options)
    
    if search_results is None:
        print("Search isn't supported")
    else:
        # Iterate over search results
        for result in search_results:
            # Print position, page number, and found text
            print(f"At {result.position} (page {result.page_index + 1}): {result.text}")

sample.pdfThe following sample file is used in this example: sample.pdf

Expected behavior: Each search result includes the page index where the text was found.

`Advanced search example`

Combine multiple search techniques:


Pythonfrom groupdocs.parser import Parser
from groupdocs.parser.options import SearchOptions, HighlightOptions

def advanced_search(file_path, pattern, case_sensitive=False, use_regex=False):
    """
    Perform advanced text search with highlights and page numbers.
    """
    try:
        with Parser(file_path) as parser:
            # Configure highlight options
            highlight_opts = HighlightOptions(20)
            
            # Configure search options
            search_opts = SearchOptions(
                match_case=case_sensitive,
                match_whole_word=False,
                use_regular_expression=use_regex,
                search_by_pages=True,
                left_highlight_options=highlight_opts,
                right_highlight_options=highlight_opts
            )
            
            # Perform search
            results = parser.search(pattern, search_opts)
            
            if results is None:
                print("Search not supported for this document")
                return []
            
            # Process results
            found_items = []
            for result in results:
                found_items.append({
                    'text': result.text,
                    'position': result.position,
                    'page': result.page_index + 1,
                    'left_context': result.left_highlight_item.text if result.left_highlight_item else "",
                    'right_context': result.right_highlight_item.text if result.right_highlight_item else ""
                })
            
            return found_items
            
    except Exception as e:
        print(f"Error during search: {e}")
        return []

# Usage
results = advanced_search("sample.pdf", r"\d{4}-\d{2}-\d{2}", use_regex=True)
for item in results:
    print(f"Page {item['page']}: {item['left_context']}[{item['text']}]{item['right_context']}")

`Notes`

The search() method returns None if search is not supported for the document format
Use parser.features.search to check if search is available before calling search()
Regular expressions follow Python’s regex syntax
Highlight extraction adds minimal performance overhead
Page-based search (search_by_pages=True) is useful for large documents

`Related pages`

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.

Search text Leave feedback

On this page

Prerequisites

Search text by keyword

Search with regular expressions

Search with case sensitivity and whole word matching

Search text with highlights com/parser/python-net/search-text/#search-text-with-highlights class="gdoc-page__anchor clip flex align-center" title="Anchor to: Search text with highlights" aria-label="Anchor to: Search text with highlights" href=#search-text-with-highlights>

`Search text with page numbers`

`Advanced search example`

`Notes`

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!

`On this page`