GroupDocs.Parser doesn’t contain OCR functionality as a part of its distributable. Instead of it API for integrating any paid or free OCR solution is provided. See this article for details how to integrate OCR soluton to GroupDocs.Parser.
To use OCR functionality, Parser object must be properly initialized:
Instantiate ParserSettings object with the instance of class that implements OCR functionality;
The following example shows how to create an instance of Parser class with Aspose.OCR on-premise API connector:
// Create an instance of ParserSettings class with the implementation of Aspose.OCR on-premise API connectorParserSettingssettings=newParserSettings(newAsposeOCR());// Create an instance of Parser class with the parser settingsParserparser=newParser(fileName,settins);
Instantiate TextOptions object with useOcr = true;
Call GetText(TextOptions) method with TextOptions parameter and obtain TextReader object;
Check if the reader isn’t null (text extraction is supported for the document);
Read a text from the reader.
The following example shows how to extract a text from the image file:
// Create an instance of ParserSettings class with OCR ConnectorParserSettingssettings=newParserSettings(newAsposeOcrOnPremise());// Create an instance of Parser class with settingsusing(Parserparser=newParser(Constants.SampleScan,settings)){// Create an instance of TextOptions to use OCRTextOptionsoptions=newTextOptions(false,true);// Extract a text using OCRusing(TextReaderreader=parser.GetText(options)){// Print a text or 'not supported' messageConsole.WriteLine(reader==null?"Text extraction isn't supported":reader.ReadToEnd());}}
To extract text areas from image files or non-text PDF documents GetTextAreas method is used:
Instantiate ParserSettings object with the instance of class that implements OCR functionality;
Call GetTextAreas(PageTextAreaOptions) method and obtain the collection of PageTextArea objects;
Check if the collection isn’t null (text areas extraction is supported for the document);
Iterate through the collection and get rectangles and texts.
The following example shows how to extract text areas from the image file:
// Create an instance of ParserSettings class with OCR ConnectorParserSettingssettings=newParserSettings(newAsposeOcrOnPremise());// Create an instance of Parser class with settingsusing(Parserparser=newParser(Constants.SampleScan,settings)){// Create an instance of PageTextAreaOptions to use OCRPageTextAreaOptionsoptions=newPageTextAreaOptions(true);// Extract text areasIEnumerable<PageTextArea>areas=parser.GetTextAreas(options);// Check if text areas extraction is supportedif(areas==null){Console.WriteLine("Text areas extraction isn't supported");return;}// Iterate over text areasforeach(PageTextAreaainareas){// Print a text, position and size for each text areaConsole.WriteLine(a.Text);Console.WriteLine("\tPosition: ({0}; {1})",a.Rectangle.Left,a.Rectangle.Top);Console.WriteLine("\tSize: ({0}; {1})",a.Rectangle.Size.Width,a.Rectangle.Size.Height);}}
Is used to pass a rectangular area to restrict the area of the text recognition.
Handler
An instance of OcrEventHandler class to handle any warnings which occur while the text recognition.
The following sections describe how to use this property.
How to restrict the area of the text recognition
To restrict an area of the image for the text recognition OcrOptions class is used. Set Rectangle property to restrict the rectangular area for the text recognition.
The following example shows how to restrict the text recognition by the rectangular area:
// Create an instance of ParserSettings class with OCR ConnectorParserSettingssettings=newParserSettings(newAsposeOcrOnPremise());// Create an instance of Parser class with settingsusing(Parserparser=newParser(Constants.SampleScan,settings)){// Create an instance of OcrOptions to set a rectangleOcrOptionsocrOptions=newOcrOptions(newData.Rectangle(0,0,400,200));// Create an instance of TextOptions to use OCRTextOptionsoptions=newTextOptions(false,true,ocrOptions);// Extract a text using OCRusing(TextReaderreader=parser.GetText(options)){// Print a text or 'not supported' messageConsole.WriteLine(reader==null?"Text extraction isn't supported":reader.ReadToEnd());}}
How to handle warnings
To restrict an area of the image for the text recognition OcrOptions class is used. Set Handler property to handle warning messages. HasWarnings property of OcrEventHandler class is used to indicate if any warnings occur. Use Warnings to get all warnings or GetWarnings method for warnings for the page. the empty list returns if no warning occurs during the text recognition.
The following example shows how to handle warning messages:
// Create an instance of ParserSettings class with OCR ConnectorParserSettingssettings=newParserSettings(newAsposeOcrOnPremise());// Create an instance of Parser class with settingsusing(Parserparser=newParser(Constants.SampleScan,settings)){// Create an instance of OcrEventHandler to handle warningsOcrEventHandlerhandler=newOcrEventHandler();// Create an instance of OcrOptions to set a handlerOcrOptionsocrOptions=newOcrOptions(handler);// Create an instance of TextOptions to use OCRTextOptionsoptions=newTextOptions(false,true,ocrOptions);// Extract a text using OCRusing(TextReaderreader=parser.GetText(options)){// Print a text or 'not supported' messageConsole.WriteLine(reader==null?"Text extraction isn't supported":reader.ReadToEnd());}if(handler.HasWarnings){Console.WriteLine("The following warnings occur while the text recognition:");foreach(stringwinhandler.Warnings){Console.WriteLine("\\t* "+w);}}else{Console.WriteLine("the text recognition was performed without any warning.");}}
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.