Extract text from HTML documents

To extract a text from HTML documents GetText method is used. This method allows to extract a text from the entire document. Pagination and raw mode is not supported for emails.

Here are the steps to extract a text from HTML document:

  • Instantiate Parser object for the initial document;
  • Call GetText method and obtain TextReader object;
  • Read a text from reader.

The following example demonstrates how to extract a text from HTML document:

// Create an instance of Parser class
using(Parser parser = new Parser(filePath))
{
    // Extract a text into the reader
    using(TextReader reader = parser.GetText())
    {
        // Print a text from the email
        Console.WriteLine(reader.ReadToEnd());
    }
}

More resources

GitHub examples

You may easily run the code above and see the feature in action in our GitHub examples:

Free online document parser App

Along with full featured .NET library we provide simple, but powerful free Apps.

You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.