Check if reader isn’t null (formatted text extraction is supported for the document);
Read a text from reader.
The following example shows how to extract a document page text as Markdown text:
// Create an instance of Parser class
try(Parserparser=newParser(Constants.SampleDocx)){// Check if the document supports formatted text extraction
if(!parser.getFeatures().isFormattedText()){System.out.println("Document isn't supports formatted text extraction.");return;}// Get the document info
IDocumentInfodocumentInfo=parser.getDocumentInfo();// Check if the document has pages
if(documentInfo.getPageCount()==0){System.out.println("Document hasn't pages.");return;}// Iterate over pages
for(intp=0;p<documentInfo.getPageCount();p++){// Print a page number
System.out.println(String.format("Page %d/%d",p+1,documentInfo.getPageCount()));// Extract a formatted text into the reader
try(TextReaderreader=parser.getFormattedText(p,newFormattedTextOptions(FormattedTextMode.Markdown))){// Print a formatted text from the document
// We ignore null-checking as we have checked formatted text extraction feature support earlier
System.out.println(reader.readToEnd());}}}
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples: