GroupDocs.Parser for .NET 17.06 Release Notes

Major Features

There are the following features in this release:

  • Implemented the ability to extract formatted highlights.
  • Implemented the ability to extract a formatted text from FictionBook (fb2) documents.
  • Removed IsRawMode obsolete property from PdfTextExtractor, CellsTextExtractor and SlidesTextExtractor classes.

All Changes

KeySummaryIssue Type
TEXTNET-524Remove IsRawMode obsolete property from PdfTextExtractor, CellsTextExtractor and SlidesTextExtractor classesEnhancement
TEXTNET-541Implement the ability to extract a formatted text from FictionBook (fb2) documentsNew feature
TEXTNET-547Implement the ability to extract formatted highlightsNew feature

Public API and Backward Incompatible Changes

Remove IsRawMode obsolete property from PdfTextExtractor, CellsTextExtractor and SlidesTextExtractor classes

IsRawMode obsolete property was removed in this enhancement.

Public API Changes
IsRawMode obsolete property was removed from PdfTextExtractor, CellsTextExtractor and SlidesTextExtractor classes.

Use ExtractMode instead:

C#

using (var extractor = new SlidesTextExtractor(stream)) { 
  extractor.ExtractMode = ExtractMode.Standard; 
  extractor.ExtractAll(); 
} 

Implement the ability to extract a formatted text from FictionBook (fb2) documents

This feature allows to extract a formatted text from FictionBook (fb2) documents.

Public API changes
Added FictionBookFormattedTextExtractor class.

Extracting a formatted text:

C#

// Create a formatted text extractor for FictionBook (fb2)documents 
using (var extractor = new FictionBookFormattedTextExtractor(stream)) { 
// Set a document formatter to Markdown 
extractor.DocumentFormatter = new FictionBookFormattedTextExtractor(); 
// Extact a text and print it to the console 
Console.Write(extractor.ExtractAll()); 
} 

Implement the ability to extract formatted highlights

This feature allows to extract formatted highlights from documents.

Public API changes
Added ExtractHighlights method to WordsFormattedTextExtractor class.
Added ExtractHighlights method to SlidesFormattedTextExtractor class.
Added ExtractHighlights method to CellsFormattedTextExtractor class.
Added ExtractHighlights method to FictionBookFormattedTextExtractor class.
Added ExtractHighlights method to EpubFormattedTextExtractor class.
Added ExtractHighlights method to EmailFormattedTextExtractor class.

C#

using (WordsFormattedTextExtractor extractor = new WordsFormattedTextExtractor(@"document.docx"))
{
  IList<string> highlights = extractor.ExtractHighlights(
  HighlightOptions.CreateFixedLengthOptions(HighlightDirection.Left, 15, 10),
  HighlightOptions.CreateFixedLengthOptions(HighlightDirection.Right, 20, 10));

  for (int i = 0; i < highlights.Count; i++)
  {
    Console.WriteLine(highlights[i]);
  }
}