The methods return an instance of TextReader class with an extracted text. The first method extracts a text from the whole document. The second method extracts a text from the document page. To retrieve the total number of document pages getDocumentInfo method is used (see below).
Check if reader isn’t null (text extraction is supported for the document);
Read a text from reader.
The following example shows how to extract a text from a document:
// Create an instance of Parser class
try(Parserparser=newParser(Constants.SamplePdf)){// Extract a text into the reader
try(TextReaderreader=parser.getText()){// Print a text from the document
// If text extraction isn't supported, a reader is null
System.out.println(reader==null?"Text extraction isn't supported":reader.readToEnd());}}
Extract text from page
Here are the steps to extract a text from the document page:
Instantiate Parser object for the initial document;
Call isText property to check if text extraction is supported for the document;
The following example shows how to extract a text from the document page:
// Create an instance of Parser class
try(Parserparser=newParser(Constants.SamplePdf)){// Check if the document supports text extraction
if(!parser.getFeatures().isText()){System.out.println("Document isn't supports text extraction.");return;}// Get the document info
IDocumentInfodocumentInfo=parser.getDocumentInfo();// Check if the document has pages
if(documentInfo.getPageCount()==0){System.out.println("Document hasn't pages.");return;}// Iterate over pages
for(intp=0;p<documentInfo.getPageCount();p++){// Print a page number
System.out.println(String.format("Page %d/%d",p+1,documentInfo.getPageCount()));// Extract a text into the reader
try(TextReaderreader=parser.getText(p)){// Print a text from the document
// We ignore null-checking as we have checked text extraction feature support earlier
System.out.println(reader.readToEnd());}}}
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples: