Extract metadata from documents

GroupDocs.Parser allows to extract basic metadata from documents of various formats: PDF, Emails, Ebooks, Microsoft Office: Word (DOC, DOCX), PowerPoint (PPT, PPTX), Excel (XLS, XLSX), LibreOffice formats and many others (see full list at supported document formats article).

Extract metadata from documents

To extract metadata from documents simply call the getMetadata method:

Iterable<MetadataItem> getMetadata();

This method returns a collection of MetadataItem objects with following members:

MemberDescription
getNameThe name of the metadata item
getValueThe value of the metadata item

Here are the steps to extract metadata from the document:

  • Instantiate¬†Parser object for the initial document;
  • Call getMetadata method and obtain collection of document metadata objects;
  • Check if collection isn’t null (metadata extraction is supported for the document);
  • Iterate through the collection and get metadata names and values.

The following example shows how to extract metadata from a document:

// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleDocx)) {
    // Extract metadata from the document
    Iterable<MetadataItem> metadata = parser.getMetadata();
    // Check if metadata extraction is supported
    if (metadata == null) {
        System.out.println("Metatada extraction isn't supported");
    }
    // Iterate over metadata items
    for (MetadataItem item : metadata) {
        // Print an item name and value
        System.out.println(String.format("%s: %s", item.getName(), item.getValue()));
    }
}

More resources

Advanced usage topics

To learn more about document data extraction features and get familiar how to extract text, images, forms and more, please refer to the advanced usage section.

GitHub examples

You may easily run the code above and see the feature in action in our GitHub examples:

Free online document parser App

Along with full featured Java library we provide simple, but powerful free Apps.

You are welcome to extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.