GroupDocs.Parser for .NET 18.4 Release Notes

Major Features

There are the following features and enhancements in this release:

  • Removed obsolete members
  • Improved the public API
  • Ability to extract topics from EPUB documents
  • Ability to detect media type for .one files

Full List of Issues Covering all Changes in this Release

KeySummaryIssue Type
TEXTNET-911Remove obsolete members (version 18.4)Enhancement
TEXTNET-915Improve the public APIEnhancement
TEXTNET-910Implement the ability to extract topics from EPUB documentsNew feature
TEXTNET-929Implement the media type detector for .one filesNew feature

Public API and Backward Incompatible Changes

Ability to extract topics from EPUB documents

Description

This feature allows extracting the table of contents (TOC) from EPUB documents.

Public API changes

Added TableOfContents property to EpubPackage class.
Added the implementation of IPageTextExtractor interface to EpubPackage class.

Usage

TableOfContents property is used to retrieve a list of TOC items:

IList<TableOfContentsItem> TableOfContents { get; }

TableOfContentsItem has the following members:

NameDescription
TextThe text of the item. Usually, it is a chapter’s title
PageIndexThe page index of the text. Null if it is just a node without content
CountThe number of sub-items. Zero if the item hasn’t sub-items
this[int index]Gets a sub-item
ExtractPageExtracts a text of the item

This example shows how to print TOC:

C#

// Create a text extractor
using (EpubTextExtractor extractor = new EpubTextExtractor(@"document.epub"))
{
    // Print TOC on the screen
    PrintToc(extractor[0].TableOfContents, 0);
}

private static void PrintToc(IEnumerable<TableOfContentsItem> tableOfContents, int depth)
{
    // Use spaces to indicate the depth of the TOC item
    string spaces = new string(' ', depth);

    // Iterate over items
    foreach (TableOfContentsItem item in tableOfContents)
    {
        System.Console.Write(spaces);
        // Print the item's text
        System.Console.Write(item.Text);

        // If item has a text (it's not just a node)
        if (item.PageIndex.HasValue)
        {
            // Print the text length
            System.Console.Write(string.Format(" ({0})", item.ExtractPage().Length));
        }

        System.Console.WriteLine();

        // If the item has children
        if (item.Count > 0)
        {
            // Print them
            PrintToc(item, depth + 1);
        }
    }
}

This example shows how to extract a text of the item:

C#

// Create a text extractor
using (EpubTextExtractor = new EpubTextExtractor(@"document.epub"))
{
    // Print a content of the third sub-item of the second item
    Console.WriteLine(extractor[0].TableOfContents[1][2].ExtractPage());
}

Implementation of the media type detector for .one files

Description

This feature allows detecting the media type of OneNote sections.

Public API changes

Added NoteMediaTypeDetector class.

Usage

C#

// Create a media type detector
var detector = new NoteMediaTypeDetector();
// Detect a media type by the file name
Console.WriteLine(detector.Detect("section.one");
// Detect a media type by the content
Console.WriteLine(detector.Detect(stream));

Improved public API

Description

This enhancement allows working with functionality via generic interfaces.

Public API changes

Added implementation of ITextExtractorWithFormatter interface to ChmFormattedTextExtractor class.
Added implementation of ITextExtractorWithFormatter interface to MarkdownFormattedTextExtractor class.
Added implementation of ISearchableIRegexSearchable and IHighlightExtractor interfaces to MarkdownTextExtractor class.

Usage

With ITextExtractorWithFormatter interface a user can set a document formatter:

C#

// If the extractor supports ITextExtractorWithFormatter interface
if (extractor is ITextExtractorWithFormatter) {
  // Set MarkdownDocumentFormatter formatter
  (extractor as ITextExtractorWithFormatter).DocumentFormatter = new MarkdownDocumentFormatter();
}

Removal of obsolete members

Description

EmailSubjectEmailSender and EmailReceiver constants from PersonalStorageContainer class are removed.

Public API changes

EmailSubjectEmailSender and EmailReceiver constants from PersonalStorageContainer class are removed.

Usage

New constantRemoved constant
MetadataNames.SubjectPersonalStorageContainer.EmailSubject
MetadataNames.EmailFromPersonalStorageContainer.EmailSender
MetadataNames.EmailToPersonalStorageContainer.EmailReceiver