Here are the steps to extract extract table of contents from the document:
Instantiate Parser object for the initial document;
Call getToc method and obtain collection of TocItem objects;
Check if collection isn’t null (table of contents extraction is supported for the document);
Iterate through the collection and get page index to extract a page text from the document.
The following example shows how to extract table of contents from CHM file:
// Create an instance of Parser class
try(Parserparser=newParser(Constants.SampleChm)){// Check if text extraction is supported
if(!parser.getFeatures().isText()){System.out.println("Text extraction isn't supported.");return;}// Check if toc extraction is supported
if(!parser.getFeatures().isToc()){System.out.println("Toc extraction isn't supported.");return;}// Get table of contents
Iterable<TocItem>toc=parser.getToc();// Iterate over items
for(TocItemi:toc){// Print the Toc text
System.out.println(i.getText());// Check if page index has a value
if(i.getPageIndex()==null){continue;}// Extract a page text
try(TextReaderreader=parser.getText(i.getPageIndex())){System.out.println(reader.readToEnd());}}}
More resources
Advanced usage topics
To learn more about document data extraction features and get familiar how to extract text, images, forms and more, please refer to the advanced usage section.
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples: