GroupDocs.Parser provides the functionality to extract a text by an item of table of contents. This feature is supported for Word Processing, PDF, ePUB and CHM documents (for more details, see Supported Document Formats).
Here are the steps to extract a text by an item of table of contents:
Instantiate Parser object for the initial document;
Call getToc method and obtain the collection of TocItem objects;
Check if collection isn’t null (table of contents extraction is supported for the document);
Iterate through the collection and extract a text
The following example shows how to extract a text by an item of table of contents:
// Create an instance of Parser class
try(Parserparser=newParser(Constants.SampleDocxWithToc)){// Get table of contents
Iterable<TocItem>tocItems=parser.getToc();// Check if toc extraction is supported
if(tocItems==null){System.out.println("Table of contents extraction isn't supported");}// Iterate over items
for(TocItemtocItem:tocItems){// Print the text of the chapter
try(TextReaderreader=tocItem.extractText()){System.out.println("----");System.out.println(reader.readToEnd());}}}
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples: