Extract a text from images and PDFs Leave feedback

GroupDocs.Parser for .NET provides the ability to extract a text from images and PDFs (which don’t contain a plain text) for English language.

Note
To use the OCR functionality in .NET Framework set PlatformTarget to x64. If downloadable (msi or zip) version of GroupDocs.Parser is used, see readme.txt file for the additional information.

The following example shows how to extract a text from images and PDFs:

// Create an instance of Parser class
using (Parser parser = new Parser("scanned.pdf"))
{
    // Create an instance of TextOptions to use OCR
    TextOptions options = new TextOptions(false, true);
    // Extract a text using OCR
    using(TextReader reader = parser.GetText(options))
    {
        // Print a text or 'not supported' message
        Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
    }
}

TextOptions can be omitted if the file is an image:

// Create an instance of Parser class
using (Parser parser = new Parser("scanned.jpg"))
{
    // Extract a text using OCR
    using(TextReader reader = parser.GetText())
    {
        // Print a text or 'not supported' message
        Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
    }
}

We value your opinion. Your feedback will help us improve our documentation.

Extract a text from images and PDFs Leave feedback

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!