GroupDocs.Parser doesn’t contain OCR functionality as a part of its distributable. Instead of it API for integrating any paid or free OCR solution is provided. See this article for details how to integrate OCR soluton to GroupDocs.Parser.
To use OCR functionality, Parser object must be properly initialized:
Instantiate ParserSettings object with the instance of class that implements OCR functionality;
The following example shows how to create an instance of Parser class with Aspose.OCR on-premise API connector:
// Create an instance of ParserSettings class with OCR Connector
ParserSettingssettings=newParserSettings(newAsposeOcrOnPremise());// Create an instance of Parser class with settings
Parserparser=newParser(Constants.SampleScan,settings);
Instantiate TextOptions object with useOcr = true;
Call getText(TextOptions) method with TextOptions parameter and obtain TextReader object;
Check if the reader isn’t null (text extraction is supported for the document);
Read a text from the reader.
The following example shows how to extract a text from the image file:
// Create an instance of ParserSettings class with OCR Connector
ParserSettingssettings=newParserSettings(newAsposeOcrOnPremise());// Create an instance of Parser class with settings
try(Parserparser=newParser(Constants.SampleScan,settings)){// Create an instance of TextOptions to use OCR
TextOptionsoptions=newTextOptions(false,true);// Extract a text using OCR
try(TextReaderreader=parser.getText(options)){// Print a text or 'not supported' message
System.out.println(reader==null?"Text extraction isn't supported":reader.readToEnd());}}
To extract text areas from image files or non-text PDF documents GetTextAreas method is used:
Instantiate ParserSettings object with the instance of class that implements OCR functionality;
Call getTextAreas(PageTextAreaOptions) method and obtain the collection of PageTextArea objects;
Check if the collection isn’t null (text areas extraction is supported for the document);
Iterate through the collection and get rectangles and texts.
The following example shows how to extract text areas from the image file:
// Create an instance of ParserSettings class with OCR Connector
ParserSettingssettings=newParserSettings(newAsposeOcrOnPremise());// Create an instance of Parser class with settings
try(Parserparser=newParser(Constants.SampleScan,settings)){// Create an instance of PageTextAreaOptions to use OCR
PageTextAreaOptionsoptions=newPageTextAreaOptions(true);// Extract text areas
java.lang.Iterable<PageTextArea>areas=parser.getTextAreas(options);// Check if text areas extraction is supported
if(areas==null){System.out.println("Text areas extraction isn't supported");return;}// Iterate over text areas
for(PageTextAreaa:areas){// Print a text, position and size for an each text area
System.out.println(a.getText());System.out.println(String.format("\tPosition: (%d; %d)",a.getRectangle().getLeft(),a.getRectangle().getTop()));System.out.println(String.format("\tSize: (%d; %d)",a.getRectangle().getSize().getWidth(),a.getRectangle().getSize().getHeight()));}}
Is used to pass a rectangular area to restrict the area of the text recognition.
Handler
An instance of OcrEventHandler class to handle any warnings which occur while the text recognition.
The following sections describe how to use this property.
How to restrict the area of the text recognition
To restrict an area of the image for the text recognition OcrOptions class is used. Set Rectangle property to restrict the rectangular area for the text recognition.
The following example shows how to restrict the text recognition by the rectangular area:
// Create an instance of ParserSettings class with OCR Connector
ParserSettingssettings=newParserSettings(newAsposeOcrOnPremise());// Create an instance of Parser class with settings
try(Parserparser=newParser(Constants.SampleScan,settings)){// Create an instance of OcrOptions to set a rectangle
OcrOptionsocrOptions=newOcrOptions(newRectangle(0,0,400,200));// Create an instance of TextOptions to use OCR
TextOptionsoptions=newTextOptions(false,true,ocrOptions);// Extract a text using OCR
try(TextReaderreader=parser.getText(options)){// Print a text or 'not supported' message
System.out.println(reader==null?"Text extraction isn't supported":reader.readToEnd());}}
How to handle warnings
To restrict an area of the image for the text recognition OcrOptions class is used. Set Handler property to handle warning messages. hasWarnings property of OcrEventHandler class is used to indicate if any warnings occur. Use Warnings to get all warnings or getWarnings method for warnings for the page. The empty list returns if no warning occurs during the text recognition.
The following example shows how to handle warning messages:
// Create an instance of ParserSettings class with OCR Connector
ParserSettingssettings=newParserSettings(newAsposeOcrOnPremise());// Create an instance of Parser class with settings
try(Parserparser=newParser(Constants.SampleScan,settings)){// Create an instance of OcrEventHandler to handle warnings
OcrEventHandlerhandler=newOcrEventHandler();// Create an instance of OcrOptions to set a handler
OcrOptionsocrOptions=newOcrOptions(null,handler);// Create an instance of TextOptions to use OCR
TextOptionsoptions=newTextOptions(false,true,ocrOptions);// Extract a text using OCR
try(TextReaderreader=parser.getText(options)){// Print a text or 'not supported' message
System.out.println(reader==null?"Text extraction isn't supported":reader.readToEnd());}if(handler.hasWarnings()){System.out.println("The following warnings occur while text recognition:");for(Stringw:handler.getWarnings()){System.out.println("\\t* "+w);}}else{System.out.println("Text recognition was performed without any warning.");}}
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.