OCR support means the ability to connect an external module (library) for the recognition of printed text (optical character recognition, OCR) on images, either separate or embedded in documents.
To connect OCR, you need to implement the IOcrConnector interface in the client code.
The following example demonstrates how to implement the OCR connector using com.aspose.ocr library for text recognition in images.
constindexFolder='c:/MyIndex/';constdocumentsFolder='c:/MyDocuments/';constquery='Einstein';// Creating an index
constindex=newgroupdocs.search.Index(indexFolder,true);// Subscribing to the ErrorOccurred event
index.getEvents().ErrorOccurred.add(java.newProxy('com.groupdocs.search.events.EventHandler',{invoke:function(sender,args){console.log(args.getMessage());},}),);// Setting the OCR indexing options
constoptions=newgroupdocs.search.IndexingOptions();options.getOcrIndexingOptions().setEnabledForSeparateImages(true);options.getOcrIndexingOptions().setEnabledForEmbeddedImages(true);constocrConnector=java.newProxy('com.groupdocs.search.options.IOcrConnector',{recognize:function(context){switch(String(context.getImageLocation())){case'Separate':case'Embedded':case'ContainerItem':constimage=java.callStaticMethodSync('javax.imageio.ImageIO','read',context.getImageStream());constasposeOcr=newgroupdocs.search.AsposeOcr();constresult=asposeOcr.RecognizePage(image);returnresult;default:thrownewError('The image type is not supported: '+context.getImageLocation());}},});options.getOcrIndexingOptions().setOcrConnector(ocrConnector);// Indexing documents in a document folder
index.add(documentsFolder,options);// Searching in the index
constresult=index.search(query);
More resources
GitHub examples
You may easily run the code from documentation articles and see the features in action in our GitHub examples: