GroupDocs.Parser for Java 22.11 Release Notes

Full List of Issues Covering all Changes in this Release

KeySummaryCategory
PARSERNET-1961Implement the ability to use OCR for images and PDF documentsNew Feature
PARSERNET-1903Implement the support for attachment extraction from presentationsNew Feature
PARSERNET-1904Implement the support for attachment extraction from spreadsheetsNew Feature
PARSERNET-1905Implement the support for attachment extraction from word processing documentsNew Feature

Public API and Backward Incompatible Changes

Implement the support for attachment extraction from presentations, spreadsheets and word processing documents

Description

These features provide the ability to extract attachments from documents.

Public API changes

No public API changes.

Usage

The following example shows how to extract a text from document attachments:

// Create an instance of Parser class
try (Parser parser = new Parser(fileName)) {
    // Extract attachments from the container
    Iterable<ContainerItem> attachments = parser.getContainer();
    // Check if container extraction is supported
    if (attachments == null) {
        System.out.println("Container extraction isn't supported");
    }
    // Iterate over zip entities
    for (ContainerItem item : attachments) {
        // Print the file path
        System.out.println(item.getFilePath());
        // Print metadata
        for (MetadataItem metadata : item.getMetadata()) {
            System.out.println(String.format("%s: %s", metadata.getName(), metadata.getValue()));
        }
        try {
            // Create Parser object for the zip entity content
            try (Parser attachmentParser = item.openParser()) {
                // Extract an zip entity text
                try (TextReader reader = attachmentParser.getText()) {
                    System.out.println(reader == null ? "No text" : reader.readToEnd());
                }
            }
        } catch (UnsupportedDocumentFormatException ex) {
            System.out.println("Isn't supported.");
        }
    }
}

Implement the ability to use OCR for images and PDF documents

Description

This feature provides the ability to extract a text and text areas using OCR.

Public API changes

GroupDocs.Parser.Options.Features public class was updated with changes as follows:

GroupDocs.Parser.Options.PageTextAreaOptions public class was updated with changes as follows:

GroupDocs.Parser.Options.TextOptions public class was updated with changes as follows:

GroupDocs.Parser.Options.ParserSettings public class was updated with changes as follows:

GroupDocs.Parser.Options.Parser public class was updated with changes as follows:

  • Added Parser(string, ParserSettings) and Parser(Stream, ParserSettings) constructors;

OcrConnectorBase, OcrEventHandler, OcrOptions classes were added into GroupDocs.Parser.Options namespace.

Usage

The following example shows how to extract a text from the image file:

// Create an instance of ParserSettings class with OCR Connector
ParserSettings settings = new ParserSettings(new AsposeOcrOnPremise());
// Create an instance of Parser class with settings
try (Parser parser = new Parser(Constants.SampleScan, settings)) {
    // Create an instance of TextOptions to use OCR
    TextOptions options = new TextOptions(false, true);
    // Extract a text using OCR
    try (TextReader reader = parser.getText(options)) {
        // Print a text or 'not supported' message
        System.out.println(reader == null ? "Text extraction isn't supported" : reader.readToEnd());
    }
}

See OCR Usage Basics for more details.