Extract text by table of contents item

GroupDocs.Parser provides the functionality to extract a text by an item of table of contents. This feature is supported for Word Processing, PDF, ePUB and CHM documents (for more details, see Supported Document Formats).

Text is extracted by TocItem.GetText method:

// Get the first item of table of contents
TocItem tocItem = parser.GetToc().First();

// Print the text of the chapter
using (TextReader reader = tocItem.GetText())
{
    Console.WriteLine("----");
    Console.WriteLine(reader.ReadToEnd());
}

This method returns a text from the chapter to which table of contents item refers (without sub-chapters). For example, “Heading 1.2” from the page

returns the following text:

“Heading 2” from the page:

returns the following text:

Here are the steps to extract a text by an item of table of contents:

  • Instantiate Parser object for the initial document;
  • Call GetToc method and obtain the collection of TocItem objects;
  • Check if collection isn’t null (table of contents extraction is supported for the document);
  • Iterate through the collection and extract a text by GetText method.

The following example shows how to extract a text by an item of table of contents:

// Create an instance of Parser class
using (Parser parser = new Parser(Constants.SampleDocxWithToc))
{
    // Get table of contents
    IEnumerable<TocItem> tocItems = parser.GetToc();
    // Check if toc extraction is supported
    if (tocItems == null)
    {
        Console.WriteLine("Table of contents extraction isn't supported");
    }
    // Iterate over items
    foreach (TocItem tocItem in tocItems)
    {
        // Print the text of the chapter
        using (TextReader reader = tocItem.GetText())
        {
            Console.WriteLine("----");
            Console.WriteLine(reader.ReadToEnd());
        }
    }
}

More resources

GitHub examples

You may easily run the code above and see the feature in action in our GitHub examples:

Free online document parser App

Along with full featured .NET library we provide simple, but powerful free Apps.

You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.