GroupDocs.Parser for .NET 24.6 provides the ability to extract a text from images and PDFs (which don’t contain a plain text) for English language.
Note
To use the OCR functionality in .NET Framework set PlatformTarget to x64. If downloadable (msi or zip) version of GroupDocs.Parser is used, see readme.txt file for the additional information.
The following example shows how to extract a text from images and PDFs:
// Create an instance of Parser classusing(Parserparser=newParser("scanned.pdf")){// Create an instance of TextOptions to use OCRTextOptionsoptions=newTextOptions(false,true);// Extract a text using OCRusing(TextReaderreader=parser.GetText(options)){// Print a text or 'not supported' messageConsole.WriteLine(reader==null?"Text extraction isn't supported":reader.ReadToEnd());}}
TextOptions can be omitted if the file is an image:
// Create an instance of Parser classusing(Parserparser=newParser("scanned.jpg")){// Extract a text using OCRusing(TextReaderreader=parser.GetText()){// Print a text or 'not supported' messageConsole.WriteLine(reader==null?"Text extraction isn't supported":reader.ReadToEnd());}}
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.