GroupDocs.Parser for .NET provides the ability to extract text from image files and PDFs composed of images.
Note
To use the OCR functionality in .NET Framework set PlatformTarget to x64. If downloadable (msi or zip) version of GroupDocs.Parser is used, see readme.txt file for the additional information.
It should be noted that not all languages represented by the Language class are currently supported for recognition without implicitly downloading additional resources from the internet. However, if internet access is available, all necessary resources will be downloaded implicitly when selecting any recognition language. Currently supported languages without additional downloads: English, Chinese, Japanese, Korean, Arabic.
The following example shows how to extract text from images and PDFs:
// Create an instance of Parser classusing(Parserparser=newParser("scanned.pdf")){// Set OCR optionsTextOptionsoptions=newTextOptions(false,true);options.OcrOptions=newOcrOptions();options.OcrOptions.Language=Language.Chinese;options.OcrOptions.PagePreviewOptions=newPagePreviewOptions();options.OcrOptions.PagePreviewOptions.Dpi=144;// Extract text using OCRusing(TextReaderreader=parser.GetText(options)){// Print text or 'not supported' messageConsole.WriteLine(reader==null?"Text extraction isn't supported":reader.ReadToEnd());}}
TextOptions can be omitted if the file is an image:
// Create an instance of Parser classusing(Parserparser=newParser("scanned.jpg")){// Extract text using OCRusing(TextReaderreader=parser.GetText()){// Print text or 'not supported' messageConsole.WriteLine(reader==null?"Text extraction isn't supported":reader.ReadToEnd());}}
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.
On this page
Analyzing your prompt, please hold on...
An error occurred while retrieving the results. Please refresh the page and try again.