Extract metadata from Microsoft Office Word documents

What is Word Document Metadata?

Word document metadata is hidden information stored inside your .doc and .docx files. It’s like a “digital fingerprint” that contains details about the document such as:

  • Who wrote the document
  • When it was created and last modified
  • How much time was spent writing it
  • The document’s title, subject, and keywords
  • Which version of Word was used

You can see some of this information by right-clicking a Word file and selecting “Properties” - but with GroupDocs.Parser, you can extract this data programmatically using C# code.

Why Extract Word Metadata?

Here are common scenarios where extracting Word metadata is useful:

Document Management:

  • Automatically organize documents by author or creation date
  • Build searchable document libraries
  • Track document versions and changes

Business Applications:

  • Find all documents created by a specific employee
  • Monitor document editing activity
  • Ensure proper document attribution

Compliance & Auditing:

  • Track document history for legal requirements
  • Monitor sensitive document access
  • Maintain document lifecycle records

What Information Can You Extract?

The GetMetadata method can extract these details from Word documents:

InformationWhat it tells you
titleThe document title
authorWho created the document
subjectWhat the document is about
keywordsTags or keywords for the document
commentsAny comments added to the document
companyThe company name associated with the file
managerThe manager associated with the document
created-timeWhen the document was first created
last-saved-timeWhen someone last saved changes
last-printed-timeWhen the document was last printed
total-editing-timeHow many minutes were spent editing
revision-numberHow many times the file has been revised
templateWhich Word template was used
applicationWhich application created the document
application-versionThe version of Word that created it

Plus additional technical details like hyperlink-base, content-status, category, and last-author.

How to Extract Word Metadata

It’s simple! Just follow these 3 easy steps:

Step 1: Create a Parser

Point the Parser to your Word document:

using(Parser parser = new Parser("your-document.docx"))
{
    // Your code goes here
}

Step 2: Get the Metadata

Extract all the metadata information:

IEnumerable<MetadataItem> metadata = parser.GetMetadata();

Step 3: Read the Results

Loop through and display what you found:

foreach(MetadataItem item in metadata)
{
    Console.WriteLine($"{item.Name}: {item.Value}");
}

Complete Working Example

Here’s the full code that extracts and displays all metadata from a Word document:

// Create an instance of Parser class
using(Parser parser = new Parser(filePath))
{
    // Extract metadata from the document
    IEnumerable<MetadataItem> metadata = parser.GetMetadata();
    
    // Iterate over metadata items
    foreach(MetadataItem item in metadata)
    {
        // Print the item name and value
        Console.WriteLine(string.Format("{0}: {1}", item.Name, item.Value));
    }
}
Warning
Note: If a Word document doesn’t have metadata or the file format isn’t supported, you might get no results. This is normal - not all documents have complete metadata information.

Supported Word File Formats

This works with all common Microsoft Word formats:

  • .doc - Older Word documents (Word 97-2003)
  • .docx - Newer Word documents (Word 2007 and later)
  • .dot - Word document templates
  • .dotx - Word template files

Practical Examples

Example 1: Find Documents by Author

Let’s say you want to find all Word documents in a folder that were created by a specific person:

string[] documents = Directory.GetFiles(@"C:\MyDocuments", "*.docx");

foreach (string document in documents)
{
    using (Parser parser = new Parser(document))
    {
        var metadata = parser.GetMetadata();
        var author = metadata.FirstOrDefault(m => m.Name == "author");
        
        if (author?.Value?.ToString() == "John Smith")
        {
            Console.WriteLine($"Found document by John Smith: {Path.GetFileName(document)}");
        }
    }
}

Example 2: Document Summary Report

Create a summary report showing key information about your Word documents:

using (Parser parser = new Parser("report.docx"))
{
    var metadata = parser.GetMetadata();
    
    Console.WriteLine("=== Document Summary ===");
    Console.WriteLine($"Title: {GetValue(metadata, "title")}");
    Console.WriteLine($"Author: {GetValue(metadata, "author")}");
    Console.WriteLine($"Created: {GetValue(metadata, "created-time")}");
    Console.WriteLine($"Last Modified: {GetValue(metadata, "last-saved-time")}");
    Console.WriteLine($"Total Editing Time: {GetValue(metadata, "total-editing-time")} minutes");
}

string GetValue(IEnumerable<MetadataItem> metadata, string name)
{
    return metadata.FirstOrDefault(m => m.Name == name)?.Value?.ToString() ?? "Not available";
}

Common Use Cases

For Businesses:

  • Track employee document creation and editing activity
  • Organize documents by department or project
  • Monitor document compliance and approval workflows

For Developers:

  • Build document management systems
  • Create automated file organization tools
  • Add document search and filtering capabilities

For Content Managers:

  • Maintain document libraries and archives
  • Track document versions and revision history
  • Ensure proper document metadata standards

Troubleshooting Common Issues

Problem: Not getting any metadata back?

  • Solution: The document might not have metadata properties set, or the file format might not be supported

Problem: Getting a “file not found” error?

  • Solution: Check that the file path is correct and the file exists

Problem: Can’t read password-protected documents?

  • Solution: Remove password protection first, as encrypted documents require special handling

Problem: Some metadata fields are empty?

  • Solution: This is normal - not all Word documents have all metadata fields populated

Best Practices

  1. Always use using statements to properly dispose of Parser objects
  2. Check for null values when accessing specific metadata fields
  3. Handle exceptions for corrupted or inaccessible files
  4. Test with different Word versions to ensure compatibility
  5. Validate file formats before attempting to extract metadata

More resources

GitHub examples

You may easily run the code above and see the feature in action in our GitHub examples:

Free online document parser App

Along with full featured .NET library we provide simple, but powerful free Apps.

You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.

Close
Loading

Analyzing your prompt, please hold on...

An error occurred while retrieving the results. Please refresh the page and try again.