GroupDocs.Parser for .NET 18.5 Release Notes

Major Features

There are the following enhancements in this release:

  • Standard extract mode is used as default behavior
  • Implemented the support for GitHub Markdown syntax

Full List of Issues Covering all Changes in this Release

KeySummaryIssue Type
PARSERNET-948Implement Standard extract mode as default behaviorEnhancement
PARSERNET-877Implement the support for GitHub Markdown syntaxEnhancement

Public API and Backward Incompatible Changes

Standard Extract Mode as the Default Behavior

Description

This enhancement changes the default behavior of text extraction. The text is extracted with better quality but it takes more time. Use ExtractMode property to change this behavior.

Public API changes

No public API changes.

Usage

ExtractMode enumeration has the following members:

ValueDescription
SimpleFast text extraction. The text in this mode is not extracted in a very accurate way but it is faster than the standard mode. If the fast text extraction doesn't support the document format, then this parameter is ignored and the standard text extraction is used.
StandardStandard text extraction.

C#

// Create a text extractor
CellsTextExtractor extractor = new CellsTextExtractor("document.xls");
// Set ExtractMode for the faster text extraction
extractor.ExtractMode = ExtractMode.Simple;
// Extract a text
Console.WriteLine(extractor.ExtractAll());

Support for GitHub Markdown Syntax

Description

This enhancement allows extracting GitHub-specific objects from Markdown (md) documents.

Public API changes

Added read-only indexer to StructuredElementProperties class.
Added TaskState constant ListItemProperties class.
Added TextProperties constructor with three parameters - (isBold, isItalic, style).

Usage

C#

// Create a text extractor for Markdown documents
using (var extractor = new MarkdownTextExtractor(stream)) {
  // Extract a line of the text
  string line = extractor.ExtractLine();
  // If the line is null, then the end of the file is reached
  while (line != null) {
    // Print a line to the console
    Console.WriteLine(line);
    // Extract another line
    line = extractor.ExtractLine();
  }
} 

Extracts all characters from a document:

C#

 // Create a text extractor for Markdown documents
 using (var extractor = new MarkdownTextExtractor(stream)) {
   // Extract a text
   Console.WriteLine(extractor.ExtractAll());
 }