OCR Usage Basics

Although GroupDocs.Redaction itself does not contain OCR as a part of its distributable, it allows you to integrate any paid or free OCR solution. You have to implement IOcrConnector interface and its recognize() method, taking a stream with an image as an argument and returning a structured representation of the text, including bounding rectangles.

Java

public class MyOwnOcrConnector implements IOcrConnector
{
    public MyOwnOcrConnector()
    {
    }

    public RecognizedImage recognize(InputStream imageStream)
    {
	// TODO Create an instance of RecognizedImage class using OCR result returned by your OCR toolkit
    }
}

Once the instance is passed to RedactorSettings constructor, GroupDocs.Redaction will use it for image files and embedded images during an ordinary textual redaction process.

Java

try (Redactor redactor = new Redactor("\\Sample.docx", new LoadOptions(), new RedactorSettings(new MyOwnOcrConnector())))
{
    // Assign an instance before using Redactor
    redactor.apply(new ExactPhraseRedaction("John Doe", new ReplacementOptions(java.awt.Color.BLACK)));
    redactor.save();
}

GroupDocs.Redaction provides two examples of the IOcrConnector implementation, free to use and customize for your needs. First, the implementation based on Aspose.OCR for Cloud SDK. Second implementation is using Microsoft Azure Cognitive Services API. Both services propose a trial subscription plan, but you can use any other free or paid OCR solution, web-based or on premise, by creating your own implementation of IOcrConnector.

More resources

GitHub examples

You may easily run the code above and see the feature in action in our GitHub examples:

Free online document redaction App

Along with full featured Java library we provide simple, but powerful free Apps.

You are welcome to perform redactions for various document formats like PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Redaction App.