Methods return an instance of TextReader class with the extracted text. The first method extracts text from the whole document. The second method extracts text from the document page. To retrieve the total number of document pages GetDocumentInfo method is used (see below).
Extract text
Here are the steps to extract text from the document:
Instantiate Parser object for the initial document;
Check if reader isn’t null (text extraction is supported for the document);
Read a text from reader.
The following example shows how to extract text from a document:
// Create an instance of Parser classusing(Parserparser=newParser(filePath)){// Extract a text into the readerusing(TextReaderreader=parser.GetText()){// Print a text from the document// If text extraction isn't supported, a reader is nullConsole.WriteLine(reader==null?"Text extraction isn't supported":reader.ReadToEnd());}}
Extract text from a document page
Here are the steps to extract text from a document page:
Instantiate Parser object for the initial document;
Call Features.Text property to check if text extraction is supported for the document;
The following example shows how to extract a text from the document page:
// Create an instance of Parser classusing(Parserparser=newParser(filePath)){// Check if the document supports text extractionif(!parser.Features.Text){Console.WriteLine("Document isn't supports text extraction.");return;}// Get the document infoIDocumentInfodocumentInfo=parser.GetDocumentInfo();// Check if the document has pagesif(documentInfo.PageCount==0){Console.WriteLine("Document hasn't pages.");return;}// Iterate over pagesfor(intp=0;p<documentInfo.PageCount;p++){// Print a page number Console.WriteLine(string.Format("Page {0}/{1}",p+1,documentInfo.PageCount));// Extract a text into the readerusing(TextReaderreader=parser.GetText(p)){// Print a text from the document// We ignore null-checking as we have checked text extraction feature support earlierConsole.WriteLine(reader.ReadToEnd());}}}
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples:
Along with full featured .NET library we provide simple, but powerful free Apps.
You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.