OCR support means the ability to connect an external module (library) for the recognition of printed text (optical character recognition, OCR) on images, either separate or embedded in documents.
To connect OCR, you need to implement the IOcrConnector interface in the client code.
The following example demonstrates how to implement the OCR connector using Aspose.OCR library for text recognition in images.
stringindexFolder=@"c:\MyIndex";stringdocumentFolder=@"c:\MyDocuments";// Creating an indexIndexindex=newIndex(indexFolder);// Setting the OCR indexing optionsIndexingOptionsoptions=newIndexingOptions();options.OcrIndexingOptions.EnabledForSeparateImages=true;options.OcrIndexingOptions.EnabledForEmbeddedImages=true;options.OcrIndexingOptions.OcrConnector=newAsposeOcrConnector();// Indexing documents in a document folderindex.Add(documentFolder,options);// Searching in the indexSearchResultresult=index.Search("Einstein");...// Implementing the OCR connector that uses Aspose.OCR library// You need to install the following package:// https://www.nuget.org/packages/Aspose.OCR/publicclassAsposeOcrConnector:IOcrConnector{publicAsposeOcrConnector(){}publicstringRecognize(OcrContextcontext){stringextension=context.ImageFileExtension.ToLowerInvariant();if(context.ImageLocation==ImageLocation.Separate){switch(extension){case".gif":case".png":case".jpg":case".jpeg":case".bmp":case".tif":case".tiff":case"":returnRecognizePrivate(context);default:returnnull;}}elseif(context.ImageLocation==ImageLocation.Embedded||context.ImageLocation==ImageLocation.ContainerItem){returnRecognizePrivate(context);}else{thrownewNotSupportedException("The image type is not supported: "+context.ImageLocation);}}privatestringRecognizePrivate(OcrContextcontext){MemoryStreammemoryStream=newMemoryStream((int)context.ImageStream.Length);context.ImageStream.CopyTo(memoryStream);AsposeOcrasposeOcr=newAsposeOcr();stringresult=asposeOcr.RecognizeImage(memoryStream);returnresult;}}
The next example demonstrates how to implement the OCR connector using Aspose Cloud OCR API.
// Implementing the OCR connector that uses Aspose Cloud OCR// The full API for Aspose Cloud OCR for .NET you can find in the repository:// https://github.com/aspose-ocr-cloud/aspose-ocr-cloud-dotnet// Sid and key you can get after free registration at// https://dashboard.aspose.cloud/applicationspublicclassAsposeCloudOcrConnector:IOcrConnector{privatereadonlyConfiguration_configuration;publicAsposeCloudOcrConnector(){_configuration=newConfiguration();_configuration.AppSid="...";_configuration.AppKey="...";}publicstringRecognize(OcrContextcontext){OcrApiapi=newOcrApi(_configuration);varrequest=newPostOcrFromUrlOrContentRequest(context.ImageStream);OCRResponseresponse=api.PostOcrFromUrlOrContent(request);returnresponse.Text;}}
And the following example demonstrates how to implement the OCR connector using Tesseract.
// Implementing the OCR connector that uses Tesseract// You need to install the following packages:// https://www.nuget.org/packages/Tesseract/// https://www.nuget.org/packages/Tesseract.Data.English/publicclassTesseractOcrConnector:IOcrConnector{publicTesseractOcrConnector(){}publicstringRecognize(OcrContextcontext){varbuffer=newbyte[context.ImageStream.Length];context.ImageStream.Read(buffer,0,buffer.Length);varpath=Path.GetDirectoryName(Assembly.GetExecutingAssembly().CodeBase);path=Path.Combine(path,"tessdata");path=path.Replace("file:\\","");using(varengine=newTesseractEngine(path,"eng",EngineMode.Default))using(Piximg=Pix.LoadFromMemory(buffer))using(PagerecognizedPage=engine.Process(img)){stringrecognizedText=recognizedPage.GetText();returnrecognizedText;}}}
More resources
GitHub examples
You may easily run the code from documentation articles and see the features in action in our GitHub examples: