Skip to end of metadata
Go to start of metadata

This page contains release notes for GroupDocs.Parser for Java 18.11.

Major Features

There are the following features in this release:

  • Implemented the ability to retrieve the information of supported extractors for a document
  • Implemented IFastTextExtractor interface
  • Implemented IDocumentContentExtractor interface
  • Improved text area extraction for PDF documents

Full List of Issues Covering all Changes in this Release

PARSERNET-1077Implement the ability to retrieve the information of supported extractors for a documentNew feature
PARSERNET-1075Implement IFastTextExtractor interfaceEnhancement
PARSERNET-1076Implement IDocumentContentExtractor interfaceEnhancement
PARSERNET-1069Improve text area extraction for PDF documentsEnhancement

Public API and Backward Incompatible Changes

This section lists public API changes that were introduced in GroupDocs.Parser for Java 18.11. It includes not only new and obsoleted public methods, but also a description of any changes in the behavior behind the scenes in GroupDocs.Parser which may affect existing code. Any behavior introduced that could be seen as a regression and modifies existing behavior is especially important and is documented here.

Ability to retrieve the information of supported extractors for a document

Description

This enhancement allows getting the information of supported extractors for a document.

Public API changes

  • Added DocumentInfo class

  • Added getDocumentInfo methods to ExtractorFactory class

Usage

DocumentInfo class has the following properties:

Property
Description
hasTextBoolean value indicating if a user can extract a plain text from a document
hasFormattedTextBoolean value indicating if a user can extract a formatted text from a document
hasMetadataBoolean value indicating if a user can extract metadata from a document
isContainerBoolean value indicating if a document contains other documents (like email attachments or zip archive)

Usage:

Java

Improved text area extraction for PDF documents

Description

This enhancement improves text area extraction for PDF documents. The Y-coordinates of text areas start from the top of the page. Text areas have more items for some kind of documents.

Public API changes

No API changes.

Usage

Java

IFastTextExtractor interface

Description

This enhancement allows setting the fast text extraction via IFastTextExtractor interface.

Public API changes

Added IFastTextExtractor interface

Added support for IFastTextExtractor interface to the following classes:

  • PdfTextExtractor class
  • CellsTextExtractor class
  • SlidesTextExtractor class

Usage

IFastTextExtractor interface has only one property:

This property gets or sets a value indicating the mode of text extraction. ExtractMode enumeration has the following members:

 

Value

Description

Simple

Fast text extraction. The text in this mode is not extracted in a very accurate way but faster than it is extracted in the standard mode. If the fast text extraction doesn't support the document format, this parameter is ignored and the standard text extraction is used.

Standard

Standard text extraction.

Usage:

Java

IDocumentContentExtractor interface

Description

This enhancement allows getting the access to Text Analysis API via IDocumentContentExtractor interface.

Public API changes

Added IDocumentContentExtractor interface

Added support for IDocumentContentExtractor interface to the following classes:

  • PdfTextExtractor class
  • CellsTextExtractor class
  • SlidesTextExtractor class
  • WordsTextExtractor class

Usage

IDocumentContentExtractor interface has only one property:

This property gets the access to the document's content.

Usage:

Java
Labels
  • No labels