GroupDocs.Parser allows to extract basic metadata from documents of various formats: PDF, Emails, Ebooks, Microsoft Office: Word (DOC, DOCX), PowerPoint (PPT, PPTX), Excel (XLS, XLSX), LibreOffice formats and many others (see full list at supported document formats article).
Extract metadata from documents
To extract metadata from documents simply call the getMetadata method:
Iterable<MetadataItem>getMetadata();
This method returns a collection of MetadataItem objects with following members:
Here are the steps to extract metadata from the document:
Instantiate Parser object for the initial document;
Call getMetadata method and obtain collection of document metadata objects;
Check if collection isn’t null (metadata extraction is supported for the document);
Iterate through the collection and get metadata names and values.
The following example shows how to extract metadata from a document:
// Create an instance of Parser class
try(Parserparser=newParser(Constants.SampleDocx)){// Extract metadata from the document
Iterable<MetadataItem>metadata=parser.getMetadata();// Check if metadata extraction is supported
if(metadata==null){System.out.println("Metatada extraction isn't supported");}// Iterate over metadata items
for(MetadataItemitem:metadata){// Print an item name and value
System.out.println(String.format("%s: %s",item.getName(),item.getValue()));}}
More resources
Advanced usage topics
To learn more about document data extraction features and get familiar how to extract text, images, forms and more, please refer to the advanced usage section.
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples: