Extract metadata from Microsoft Office Word documents

To extract metadata from Microsoft Office Word documents GetMetadata method is used. This method allows to extract the following metadata:

titleThe title of the document.
subjectThe subject of the document.
keywordsThe keyword of the document.
commentsThe comments of the document.
content-statusThe content status of the document.
categoryThe category of the document.
companyThe company of the document.
managerThe manager of the document.
authorThe name of the document’s author.
last-authorThe name of the last document’s author.
hyperlink-baseThe base string used for evaluating relative hyperlinks in this document.
applicationThe name of the application.
application-versionThe version number of the application that created the document.
templateThe informational name of the document template.
created-timeThe time of the document creation.
last-saved-timeThe time of the the document when it was last saved.
last-printed-timeThe time of the document when it was last printed.
revision-numberThe document revision number.
total-editing-timeThe total editing time in minutes.

Here are the steps to extract metadata from Microsoft Office Word document:

  • Instantiate Parser object for the initial document;
  • Call GetMetadata method and obtain collection of document metadata objects;
  • Iterate through the collection and get metadata names and values.
GetMetadata method returns null value if metadata extraction isn’t supported for the document. For example, metadata extraction isn’t supported for Zip archive. Therefore, for Zip archive GetMetadata method returns null. If Microsoft Office Word document has no metadata, GetMetadata method returns an empty collection.

The following example demonstrates how to extract metadata from Microsoft Office Word document:

// Create an instance of Parser class
using(Parser parser = new Parser(filePath))
    // Extract metadata from the document
    IEnumerable<MetadataItem> metadata = parser.GetMetadata();
    // Iterate over metadata items
    foreach(MetadataItem item in metadata)
        // Print the item name and value
        Console.WriteLine(string.Format("{0}: {1}", item.Name, item.Value));

