Methods return an instance of TextReader class with an extracted text. The first method extracts text from the whole document. The second method extracts text from the document page. To retrieve the total number of document pages GetDocumentInfo method is used (see below).
Warning
Instead of the accurate mode, RawPageCount property of IDocumentInfo class is used to avoid extra calculations.
Extract text
Here are the steps to extract a raw text from document:
Instantiate Parser object for the initial document;
Instantiate TextOptions object with true parameter;
Check if reader isn’t null (text extraction is supported for the document);
Read a text from reader.
The following example shows how to extract a raw text from a document:
// Create an instance of Parser classusing(Parserparser=newParser(filePath)){// Extract a raw text into the readerusing(TextReaderreader=parser.GetText(newTextOptions(true))){// Print a text from the document// If text extraction isn't supported, a reader is nullConsole.WriteLine(reader==null?"Text extraction isn't supported":reader.ReadToEnd());}}
Extract text from a page
Here are the steps to extract a raw text from the document page:
Instantiate Parser object for the initial document;
Instantiate TextOptions object with true parameter;
The following example shows how to extract a raw text from a document page:
// Create an instance of Parser classusing(Parserparser=newParser(filePath)){// Check if the document supports text extractionif(!parser.Features.Text){Console.WriteLine("Document isn't supports text extraction.");return;}// Get the document infoIDocumentInfodocumentInfo=parser.GetDocumentInfo();// Check if the document has pagesif(documentInfo==null||documentInfo.RawPageCount==0){Console.WriteLine("Document hasn't pages.");return;}// Iterate over pagesfor(intp=0;p<documentInfo.RawPageCount;p++){// Print a page number Console.WriteLine(string.Format("Page {0}/{1}",p+1,documentInfo.RawPageCount));// Extract a text into the readerusing(TextReaderreader=parser.GetText(p,newTextOptions(true))){// Print a text from the document// We ignore null-checking as we have checked text extraction feature support earlierConsole.WriteLine(reader.ReadToEnd());}}}
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples:
Along with full featured .NET library we provide simple, but powerful free Apps.
You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.