Extract table of contents Leave feedback

GroupDocs.Parser allows to extract table of contents from Microsoft Word (DOC, DOCX etc), PDF documents and Ebooks.

Extract table of contents

To extract table of contents from documents, please use the getToc method:

Iterable<TocItem> getToc();

TocItem class has the following members:

Member	Description
getDepth	The depth level.
getPageIndex	The page index.
getText	The text.
extractText	Extracts a text from the document to which TocItem object refers. For detail, see Extract text by table of contents item

Here are the steps to extract extract table of contents from the document:

Instantiate Parser object for the initial document;
Call getToc method and obtain collection of TocItem objects;
Check if collection isn’t null (table of contents extraction is supported for the document);
Iterate through the collection and get page index to extract a page text from the document.

The following example shows how to extract table of contents from CHM file:

// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleChm)) {
    // Check if text extraction is supported
    if (!parser.getFeatures().isText()) {
        System.out.println("Text extraction isn't supported.");
        return;
    }
    // Check if toc extraction is supported
    if (!parser.getFeatures().isToc()) {
        System.out.println("Toc extraction isn't supported.");
        return;
    }
    // Get table of contents
    Iterable<TocItem> toc = parser.getToc();
    // Iterate over items
    for (TocItem i : toc) {
        // Print the Toc text
        System.out.println(i.getText());
        // Check if page index has a value
        if (i.getPageIndex() == null) {
            continue;
        }
        // Extract a page text
        try (TextReader reader = parser.getText(i.getPageIndex())) {
            System.out.println(reader.readToEnd());
        }
    }
}

More resources

Advanced usage topics

To learn more about document data extraction features and get familiar how to extract text, images, forms and more, please refer to the advanced usage section.

GitHub examples

You may easily run the code above and see the feature in action in our GitHub examples:

Free online document parser App

Along with full featured Java library we provide simple, but powerful free Apps.

You are welcome to extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.

We value your opinion. Your feedback will help us improve our documentation.

Extract table of contents Leave feedback

Extract table of contents

More resources

Advanced usage topics

GitHub examples

Free online document parser App

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!