Extract a text from images and PDFs

GroupDocs.Parser for .NET 24.6 provides the ability to extract a text from images and PDFs (which don’t contain a plain text) for English language.

Note
To use the OCR functionality in .NET Framework set PlatformTarget to x64. If downloadable (msi or zip) version of GroupDocs.Parser is used, see readme.txt file for the additional information.

The following example shows how to extract a text from images and PDFs:

// Create an instance of Parser class
using (Parser parser = new Parser("scanned.pdf"))
{
    // Create an instance of TextOptions to use OCR
    TextOptions options = new TextOptions(false, true);
    // Extract a text using OCR
    using(TextReader reader = parser.GetText(options))
    {
        // Print a text or 'not supported' message
        Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
    }
}

TextOptions can be omitted if the file is an image:

// Create an instance of Parser class
using (Parser parser = new Parser("scanned.jpg"))
{
    // Extract a text using OCR
    using(TextReader reader = parser.GetText())
    {
        // Print a text or 'not supported' message
        Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
    }
}