The following example shows how to extract a document page text as Markdown text:
// Create an instance of Parser classusing(Parserparser=newParser(filePath)){// Check if the document supports formatted text extractionif(!parser.Features.FormattedText){Console.WriteLine("Document isn't supports formatted text extraction.");return;}// Get the document infoIDocumentInfodocumentInfo=parser.GetDocumentInfo();// Check if the document has pagesif(documentInfo.PageCount==0){Console.WriteLine("Document hasn't pages.");return;}// Iterate over pagesfor(intp=0;p<documentInfo.PageCount;p++){// Print a page number Console.WriteLine(string.Format("Page {0}/{1}",p+1,documentInfo.PageCount));// Extract a formatted text into the readerusing(TextReaderreader=parser.GetFormattedText(p,newFormattedTextOptions(FormattedTextMode.Html))){// Print a formatted text from the document// We ignore null-checking as we have checked formatted text extraction feature support earlierConsole.WriteLine(reader.ReadToEnd());}}}
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples:
Along with full featured .NET library we provide simple, but powerful free Apps.
You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.