OCR Usage Basics Leave feedback

GroupDocs.Parser for .NET provides the ability to extract a text from images and PDFs (which don’t contain a plain text) for English language.

GetText and GetTextAreas methods from Parser class are used to recognize a text from images.

To extract a text from image files or non-text PDF documents GetText method is used:

Instantiate Parser object;
Instantiate TextOptions object with useOcr = true;
Call GetText(TextOptions) method with TextOptions parameter and obtain TextReader object;
Check if the reader isn’t null (text extraction is supported for the document);
Read a text from the reader.

The following example shows how to extract a text from the image file:

// Create an instance of Parser class
using (Parser parser = new Parser(Constants.SampleScan))
{
    // Create an instance of TextOptions to use OCR
    TextOptions options = new TextOptions(false, true);
    // Extract a text using OCR
    using(TextReader reader = parser.GetText(options))
    {
        // Print a text or 'not supported' message
        Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
    }
}

To extract text areas from image files or non-text PDF documents GetTextAreas method is used:

Instantiate Parser object;
Instantiate PageTextAreaOptions object with useOcr = true;
Call GetTextAreas(PageTextAreaOptions) method and obtain the collection of PageTextArea objects;
Check if the collection isn’t null (text areas extraction is supported for the document);
Iterate through the collection and get rectangles and texts.

The following example shows how to extract text areas from the image file:

// Create an instance of Parser class
using (Parser parser = new Parser(Constants.SampleScan))
{
    // Create an instance of PageTextAreaOptions to use OCR
    PageTextAreaOptions options = new PageTextAreaOptions(true);
  
    // Extract text areas
    IEnumerable<PageTextArea> areas = parser.GetTextAreas(options);
    
    // Check if text areas extraction is supported
    if (areas == null)
    {
        Console.WriteLine("Text areas extraction isn't supported");
        return;
    }

    // Iterate over text areas
    foreach (PageTextArea a in areas)
    {
        // Print a text, position and size for each text area
        Console.WriteLine(a.Text);
        Console.WriteLine("\tPosition: ({0}; {1})", a.Rectangle.Left, a.Rectangle.Top);
        Console.WriteLine("\tSize: ({0}; {1})", a.Rectangle.Size.Width, a.Rectangle.Size.Height);
    }
}

TextOptions and PageTextAreaOptions classes have the property OcrOptions. OcrOptions class has the following members:

Member	Description
Rectangle	Is used to pass a rectangular area to restrict the area of the text recognition.
Handler	An instance of OcrEventHandler class to handle any warnings which occur while the text recognition.

The following sections describe how to use this property.

How to restrict the area of the text recognition

To restrict an area of the image for the text recognition OcrOptions class is used. Set Rectangle property to restrict the rectangular area for the text recognition.

The following example shows how to restrict the text recognition by the rectangular area:

// Create an instance of Parser class
using (Parser parser = new Parser(Constants.SampleScan))
{
    // Create an instance of OcrOptions to set a rectangle
    OcrOptions ocrOptions = new OcrOptions(new Data.Rectangle(0, 0, 400, 200));

    // Create an instance of TextOptions to use OCR
    TextOptions options = new TextOptions(false, true, ocrOptions);
    // Extract a text using OCR
    using (TextReader reader = parser.GetText(options))
    {
        // Print a text or 'not supported' message
        Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
    }
}

How to handle warnings

To restrict an area of the image for the text recognition OcrOptions class is used. Set Handler property to handle warning messages. HasWarnings property of OcrEventHandler class is used to indicate if any warnings occur. Use Warnings to get all warnings or GetWarnings method for warnings for the page. the empty list returns if no warning occurs during the text recognition.

The following example shows how to handle warning messages:

// Create an instance of Parser class
using (Parser parser = new Parser(Constants.SampleScan))
{
    // Create an instance of OcrEventHandler to handle warnings
    OcrEventHandler handler = new OcrEventHandler();

    // Create an instance of OcrOptions to set a handler
    OcrOptions ocrOptions = new OcrOptions(handler);

    // Create an instance of TextOptions to use OCR
    TextOptions options = new TextOptions(false, true, ocrOptions);
    // Extract a text using OCR
    using (TextReader reader = parser.GetText(options))
    {
        // Print a text or 'not supported' message
        Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
    }

    if (handler.HasWarnings)
    {
        Console.WriteLine("The following warnings occur while the text recognition:");

        foreach (string w in handler.Warnings)
        {
            Console.WriteLine("\\t* " + w);
        }
    }
    else
    {
        Console.WriteLine("the text recognition was performed without any warning.");
    }
}

We value your opinion. Your feedback will help us improve our documentation.

OCR Usage Basics Leave feedback

How to restrict the area of the text recognition

How to handle warnings

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!