Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

To extract metadata from any supported document format, Extractor class is used. The Extractor class allows you to extract metadata without using the concrete metadata extractor classes such as WordsMetadataExtractor. Since version 18.12, GroupDocs.Parser allows you to extract metadata from the following text and presentation template formats using Extractor class:

  •  

...

  •     dotx (Template)
  •     dotm (Macro-enabled template)
  •     ott (OpenDocument Text Template)
  •     potx (Template)
  •     potm (Macro-enabled template)
  •     ppsm (Macro-enabled slideshow)
  •     pptm (Macro-enabled presentation)

The following code sample shows how to extract metadata from the documents using Extractor class.

 

HTML
<script src="https://gist.github.com/GroupDocsGists/ea14da20df6908943201c73d872c85c9.js?file=MetadataExtraction-extractMetadataUsingExtractorClass.java"></script>

ComplexMetadataExtractor Class

EPUB document can contain one or more packages. Each package has its own metadata collection. For working with such documents, ComplexMetadataExtractor class is used. It has extractComplexMetadata methods for extracting the complex metadata. The methods return an enumerator for all metadata collections. EpubMetadataExtractor is inherited from ComplexMetadataExtractor class.

Following code snippet shows how to extract complex metadata.

HTML
<script src="https://gist.github.com/GroupDocsGists/ea14da20df6908943201c73d872c85c9.js?file=MetadataExtraction-extractMetadataUsingComplexMetadataExtractor.java"></script>