Check if collection isn’t null (container extraction is supported for the document);
Iterate through the collection and get container item names, sizes and obtain content.
The following example shows how to extract a text from PDF Portfolios:
// Create an instance of Parser class
try(Parserparser=newParser(Constants.SamplePdfPortfolio)){// Extract attachments from the container
Iterable<ContainerItem>attachments=parser.getContainer();// Check if container extraction is supported
if(attachments==null){System.out.println("Container extraction isn't supported");}// Iterate over zip entities
for(ContainerItemitem:attachments){// Print the file path
System.out.println(item.getFilePath());// Print metadata
for(MetadataItemmetadata:item.getMetadata()){System.out.println(String.format("%s: %s",metadata.getName(),metadata.getValue()));}try{// Create Parser object for the zip entity content
try(ParserattachmentParser=item.openParser()){// Extract an zip entity text
try(TextReaderreader=attachmentParser.getText()){System.out.println(reader==null?"No text":reader.readToEnd());}}}catch(UnsupportedDocumentFormatExceptionex){System.out.println("Isn't supported.");}}}
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples:
Along with full featured .NET library we provide simple, but powerful free Apps.
You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.