Search text in HTML documents

To search a keyword in HTML documents Search(String) method is used. This method returns the collection of SearchResult objects. For details, see Search Text.

Here are the steps to search a keyword in HTML document:

  • Instantiate Parser object for the initial document;
  • Call Search(string) method and obtain the collection of SearchResult objects;
  • Iterate through the collection and get the position and text.
Warning
Search(String) method returns null value if search isn’t supported for the document. For example, text extraction isn’t supported for Zip archive. Therefore, for Zip archive Search(String) method returns null. For empty HTML document Search(String) method returns an empty collection.

The following example shows how to find a keyword in HTML document:

// Create an instance of Parser class
using(Parser parser = new Parser(filePath))
{
    // Search a keyword:
    IEnumerable<SearchResult> sr = parser.Search("page number");
   
    // Iterate over search results
    foreach(SearchResult s in sr)
    {
        // Print an index and found text:
        Console.WriteLine(string.Format("At {0}: {1}", s.Position, s.Text));
    }
}

Search(String, SearchOptions) is used for the advanced search in HTML documents - like search with regular expressions. SearchOptions parameter is used to customize a search.

Here are the steps to search with a regular expression in HTML document:

The following example shows how to search with a regular expression in HTML document:

// Create an instance of Parser class
using(Parser parser = new Parser(filePath))
{
    // Search with a regular expression with case matching
    IEnumerable<SearchResult> sr = parser.Search("page number: [0-9]+", new SearchOptions(true, false, true));
    // Iterate over search results
    foreach(SearchResult s in sr)
    {
        // Print an index and found text:
        Console.WriteLine(string.Format("At {0}: {1}", s.Position, s.Text));
    }
}

More resources

GitHub examples

You may easily run the code above and see the feature in action in our GitHub examples:

Free online document parser App

Along with full featured .NET library we provide simple, but powerful free Apps.

You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.