GroupDocs.Parser for .NET 17.01 Release Notes

Major Features

There are following features in this release:

  • Ability to search a text
  • Ability to extract highlight
  • Support for extracting documents from ZIP containers

All Changes

KeySummaryIssue Type
TEXTNET-338Implement the ability to search a textNew Feature
TEXTNET-339Implement the ability to extract highlightNew Feature
TEXTNET-252Implement the support for ZIP containersNew Feature
TEXTNET-496Implement the ability to open password-protected OneNote sectionsNew Feature
TEXTNET-504Add enum property to set extraction modeEnhancement

Public API and Backward Incompatible Changes

Add enum property to set extraction mode

This enhancement allows to set extraction mode via enumeration (ExtractionMode) instead of boolean property (IsRawMode).

Public API Changes
Added ExtractionMode enumeration.
Added ExtractionMode property to PdfTextExtractor, CellsTextExtractor and SlidesTextExtractor classes.
IsRawMode property is marked as Obsolete in PdfTextExtractor, CellsTextExtractor and SlidesTextExtractor classes.

C#

using (var extractor = new SlidesTextExtractor(stream)) {
  extractor.ExtractMode = ExtractMode.Standard;
  extractor.ExtractAll();
}

Implement the support for ZIP containers

This feature allows to open documents from ZIP containers.

Public API changes
Added ZipContainer class.

Enumerate all files in the archive:

C#

using (var container = new ZipContainer(stream))
{
  for(int i = 0; i < container.Entities.Count; i++)
  {
    Console.WriteLine("Name: " + container.Entities[i].Name);
    Console.WriteLine("Path: " + container.Entities[i].Path.ToString());
    Console.WriteLine("Media type: " + container.Entities[i].MediaType);
  }
}

Read the concrete file:

C#

using (var container = new ZipContainer(stream))
{
  using (TextExtractor extractor = extractorFactory.CreateTextExtractor(container.Entities[index].OpenStream())
  {
    Console.WriteLine(extractor.ExtractAll());
  }
}

Implement the ability to search a text

This feature allows to search text in documents.

Public API changes
Added ISearchable interface.
Added ISearchHandler interface.
Added ISearchEngine interface.
Added SearchResult class
Added SearchOptions class.
Added SearchHighlightOptions class.
Added ContainsSearchHandler and ListSearchHandler classes.

C#

using (WordsTextExtractor extractor = new WordsTextExtractor(@"document.docx"))
{
  ListSearchHandler handler = new ListSearchHandler();
  extractor.Search(new SearchOptions(new SearchHighlightOptions(10)), handler, null, new string[] { "test text", "keyword" });

  if (handler.List.Count == 0)
  {
    Console.WriteLine("Not found");
  }
  else
  {
    for (int i = 0; i < handler.List.Count; i++)
    {
      Console.Write(handler.List[i].LeftText);
      Console.Write("_");
      Console.Write(handler.List[i].FoundText);
      Console.Write("_");
      Console.Write(handler.List[i].RightText);
      Console.WriteLine("---");
    }
  }
}

Implement the ability to extract highlight

This feature allows to extract highlights from documents.

Public API changes

Added IHighlightExtractor interface.
Added HighlightOptions class.
Added HighlightDirection enum.

C#

using (WordsTextExtractor extractor = new WordsTextExtractor(@"document.docx"))
{
  IList<string> highlights = extractor.ExtractHighlights(
  HighlightOptions.CreateFixedLength(HighlightDirection.Left, 15, 10),
  HighlightOptions.CreateFixedLength(HighlightDirection.Right, 20, 10));

  for (int i = 0; i < highlights.Count; i++)
  {
    Console.WriteLine(highlights[i]);
  }
}

Implement the ability to open password-protected OneNote sections

This feature allows to open password-protected OneNote sections.

C#

var loadOptions = new LoadOptions();
loadOptions.Password = "123456";

using (var extractor = new NoteTextExtractor("password-protected.one", loadOptions))
{
  Console.WriteLine(extractor.ExtractAll());
}