Here are the steps to extract extract table of contents from the document:
Instantiate Parser object for the initial document;
Call GetToc method and obtain collection of TocItem objects;
Check if collection isn’t null (table of contents extraction is supported for the document);
Iterate through the collection and get page index to extract a page text from the document.
The following example shows how to extract table of contents from CHM file:
// Create an instance of Parser classusing(Parserparser=newParser(filePath)){// Check if text extraction is supportedif(!parser.Features.Text){Console.WriteLine("Text extraction isn't supported.");return;}// Check if toc extraction is supportedif(!parser.Features.Toc){Console.WriteLine("Toc extraction isn't supported.");return;}// Get table of contentsIEnumerable<TocItem>toc=parser.GetToc();// Iterate over itemsforeach(TocItemiintoc){// Print the Toc textConsole.WriteLine(i.Text);// Check if page index has a valueif(i.PageIndex==null){continue;}// Extract a page textusing(TextReaderreader=parser.GetText(i.PageIndex.Value)){Console.WriteLine(reader.ReadToEnd());}}}
More resources
Advanced usage topics
To learn more about document data extraction features and get familiar how to extract text, images, forms and more, please refer to the advanced usage section.
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples:
Along with full featured .NET library we provide simple, but powerful free Apps.
You are welcome to extract text, metadata and images from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.