Skip to end of metadata
Go to start of metadata
Contents Summary
 

The code in below examples uses some methods defined in Common Utilities.

Working with Metadata Extractors

GroupDocs.Parser also provides a simplified way of extracting metadata attached with the supported file formats. For extracting metadata, the following classes are used. 

ClassDescription
CellsMetadataExtractorProvides the functionality to extract metadata from spreadsheets.
SlidesMetadataExtractorProvides the functionality to extract metadata from presentations.
WordsMetadataExtractorProvides the functionality to extract metadata from text documents.
PdfMetadataExtractorProvides the functionality to extract metadata from PDF documents.
EmailMetadataExtractorProvides the functionality to extract metadata from email messages.
EpubMetadataExtractorProvides the functionality to extract metadata from EPUB documents.
FictionBookMetadataExtractorProvides the functionality to extract metadata from FictionBook (fb2) documents.

All classes are inherited from MetadataExtractor abstract class. It provides the interface for extracting metadata from documents. 

Following methods are used to extract metadata from the documents.

MethodDescription
extractMetadata(Stream stream)Extracts metadata from the stream
extractMetadata(string fileName)Extracts metadata from the file

All methods return an instance of MetadataCollection class. This class provides the dictionary-style collection of metadata. MetadataNames class contains all supported metadata keys. It's recommended to use MetadataNames class constants instead of using string literals to retrieve values from MetadataCollection class.

Extracting Metadata

The following code sample shows how to extract metadata from a text document. The same technique will be used to extract metadata from the other document formats using the relevant metadata extractor classes.

Extracting Metadata using Extractor Class

To extract metadata from any supported document format, Extractor class is used. The Extractor class allows you to extract metadata without using the concrete metadata extractor classes such as WordsMetadataExtractor. Since version 18.12, GroupDocs.Parser allows you to extract metadata from the following text and presentation template formats using Extractor class:

  •     dotx (Template)
  •     dotm (Macro-enabled template)
  •     ott (OpenDocument Text Template)
  •     potx (Template)
  •     potm (Macro-enabled template)
  •     ppsm (Macro-enabled slideshow)
  •     pptm (Macro-enabled presentation)

The following code sample shows how to extract metadata from the documents using Extractor class.

 

ComplexMetadataExtractor Class

EPUB document can contain one or more packages. Each package has its own metadata collection. For working with such documents, ComplexMetadataExtractor class is used. It has extractComplexMetadata methods for extracting the complex metadata. The methods return an enumerator for all metadata collections. EpubMetadataExtractor is inherited from ComplexMetadataExtractor class.

Following code snippet shows how to extract complex metadata.

Labels
  • No labels