GroupDocs.Parser for .NET 20.1 Release Notes

Major Features

There are the following features in this release:

  • Legacy API was removed (GroupDocs.Parser.Legacy namespace)
  • Implement the ability to extract a text by table of contents item
  • Implement the ability to extract table of contents from PDF and Word Processing documents

Full List of Issues Covering all Changes in this Release

KeySummaryIssue Type
PARSERNET-1099Remove obsolete members (Legacy namespace)Improvement
PARSERNET-1363Implement the ability to extract a text by TOC itemNew feature
PARSERNET-1361Implement the ability to extract TOC from Word Processing documentsNew feature
PARSERNET-1362Implement the ability to extract TOC from PDF documentsNew feature

Public API and Backward Incompatible Changes

  1. Implement the ability to extract a text by TOC item

    Description

    This feature provides the functionality to extract a text by an item of table of contents.

    Public API changes

    • Added GetText method to GroupDocs.Parser.Data.TocItem class

    Usage

    The following example shows how to extract a text by an item of table of contents:

    // Create an instance of Parser class
    using (Parser parser = new Parser(Constants.SampleDocxWithToc))
    {
        // Get table of contents
        IEnumerable<TocItem> tocItems = parser.GetToc();
        // Check if toc extraction is supported
        if (tocItems == null)
        {
            Console.WriteLine("Table of contents extraction isn't supported");
        }
        // Iterate over items
        foreach (TocItem tocItem in tocItems)
        {
            // Print the text of the chapter
            using (TextReader reader = tocItem.GetText())
            {
                Console.WriteLine("----");
                Console.WriteLine(reader.ReadToEnd());
            }
        }
    }
    
  2. Implement the ability to extract TOC from PDF documents

    Description

    This feature allows to extract table of contents (TOC) from PDF documents.

    Public API changes

    No API changes.

    Usage

    The following example shows how to extract table of contents from PDF document:

    // Create an instance of Parser class
    using (Parser parser = new Parser(filePath))
    {
        // Check if text extraction is supported
        if (!parser.Features.Text)
        {
            Console.WriteLine("Text extraction isn't supported.");
            return;
        }
        // Check if toc extraction is supported
        if (!parser.Features.Toc)
        {
            Console.WriteLine("Toc extraction isn't supported.");
            return;
        }
        // Get table of contents
        IEnumerable<TocItem> toc = parser.GetToc();
        // Iterate over items
        foreach (TocItem i in toc)
        {
            // Print the Toc text
            Console.WriteLine(i.Text);
            // Check if page index has a value
            if (i.PageIndex == null)
            {
                continue;
            }
            // Extract a page text
            using (TextReader reader = parser.GetText(i.PageIndex.Value))
            {
                Console.WriteLine(reader.ReadToEnd());
            }
        }
    }
    
  3. Implement the ability to extract TOC from Word Processing documents

    Description

    This feature allows to extract table of contents (TOC) from word processing documents.

    Public API changes

    No API changes

    Usage

    The following example shows how to extract table of contents from word processing document:

    // Create an instance of Parser class
    using (Parser parser = new Parser(filePath))
    {
        // Check if text extraction is supported
        if (!parser.Features.Text)
        {
            Console.WriteLine("Text extraction isn't supported.");
            return;
        }
        // Check if toc extraction is supported
        if (!parser.Features.Toc)
        {
            Console.WriteLine("Toc extraction isn't supported.");
            return;
        }
        // Get table of contents
        IEnumerable<TocItem> toc = parser.GetToc();
        // Iterate over items
        foreach (TocItem i in toc)
        {
            // Print the Toc text
            Console.WriteLine(i.Text);
            // Check if page index has a value
            if (i.PageIndex == null)
            {
                continue;
            }
            // Extract a page text
            using (TextReader reader = parser.GetText(i.PageIndex.Value))
            {
                Console.WriteLine(reader.ReadToEnd());
            }
        }
    }
    
  4. Remove obsolete members (Legacy namespace)

    Description

    All types from GroupDocs.Parser.Legacy namespace were removed**.
    **

    Public API changes

    • All types from GroupDocs.Parser.Legacy namespace were removed**.
      **

    Usage

See migration notes for brief comparison of how to extract data using the old and new API.