GroupDocs.Parser for .NET 22.11 Release Notes

Full List of Issues Covering all Changes in this Release

KeySummaryCategory
PARSERNET-1961Implement the ability to use OCR for images and PDF documentsNew Feature
PARSERNET-1935Document info has 0 pages during reading JPEG file contentBug

Public API and Backward Incompatible Changes

Implement the ability to use OCR for images and PDF documents

Description

This feature provides the ability to extract a text and text areas using OCR.

Public API changes

GroupDocs.Parser.Options.Features public class was updated with changes as follows:

GroupDocs.Parser.Options.PageTextAreaOptions public class was updated with changes as follows:

GroupDocs.Parser.Options.TextOptions public class was updated with changes as follows:

GroupDocs.Parser.Options.ParserSettings public class was updated with changes as follows:

GroupDocs.Parser.Options.Parser public class was updated with changes as follows:

  • Added Parser(string, ParserSettings) and Parser(Stream, ParserSettings) constructors;

OcrConnectorBase, OcrEventHandler, OcrOptions classes were added into GroupDocs.Parser.Options namespace.

Usage

The following example shows how to extract a text from the image file:

// Create an instance of ParserSettings class with OCR Connector
ParserSettings settings = new ParserSettings(new AsposeOcrOnPremise());

// Create an instance of Parser class with settings
using (Parser parser = new Parser(Constants.SampleScan, settings))
{
    // Create an instance of TextOptions to use OCR
    TextOptions options = new TextOptions(false, true);
    // Extract a text using OCR
    using(TextReader reader = parser.GetText(options))
    {
        // Print a text or 'not supported' message
        Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
    }
}

See OCR Usage Basics for more details.