Extract metadata from Microsoft Office Word documents
Extract metadata from Microsoft Office Word documents
Leave feedback
On this page
What is Word Document Metadata?
Word document metadata is hidden information stored inside your .doc and .docx files. It’s like a “digital fingerprint” that contains details about the document such as:
Who wrote the document
When it was created and last modified
How much time was spent writing it
The document’s title, subject, and keywords
Which version of Word was used
You can see some of this information by right-clicking a Word file and selecting “Properties” - but with GroupDocs.Parser, you can extract this data programmatically using C# code.
Why Extract Word Metadata?
Here are common scenarios where extracting Word metadata is useful:
Document Management:
Automatically organize documents by author or creation date
Build searchable document libraries
Track document versions and changes
Business Applications:
Find all documents created by a specific employee
Monitor document editing activity
Ensure proper document attribution
Compliance & Auditing:
Track document history for legal requirements
Monitor sensitive document access
Maintain document lifecycle records
What Information Can You Extract?
The GetMetadata method can extract these details from Word documents:
Information
What it tells you
title
The document title
author
Who created the document
subject
What the document is about
keywords
Tags or keywords for the document
comments
Any comments added to the document
company
The company name associated with the file
manager
The manager associated with the document
created-time
When the document was first created
last-saved-time
When someone last saved changes
last-printed-time
When the document was last printed
total-editing-time
How many minutes were spent editing
revision-number
How many times the file has been revised
template
Which Word template was used
application
Which application created the document
application-version
The version of Word that created it
Plus additional technical details like hyperlink-base, content-status, category, and last-author.
How to Extract Word Metadata
It’s simple! Just follow these 3 easy steps:
Step 1: Create a Parser
Point the Parser to your Word document:
using(Parserparser=newParser("your-document.docx")){// Your code goes here}
Here’s the full code that extracts and displays all metadata from a Word document:
// Create an instance of Parser classusing(Parserparser=newParser(filePath)){// Extract metadata from the documentIEnumerable<MetadataItem>metadata=parser.GetMetadata();// Iterate over metadata itemsforeach(MetadataItemiteminmetadata){// Print the item name and valueConsole.WriteLine(string.Format("{0}: {1}",item.Name,item.Value));}}
Warning
Note: If a Word document doesn’t have metadata or the file format isn’t supported, you might get no results. This is normal - not all documents have complete metadata information.
Supported Word File Formats
This works with all common Microsoft Word formats:
.doc - Older Word documents (Word 97-2003)
.docx - Newer Word documents (Word 2007 and later)
.dot - Word document templates
.dotx - Word template files
Practical Examples
Example 1: Find Documents by Author
Let’s say you want to find all Word documents in a folder that were created by a specific person:
string[]documents=Directory.GetFiles(@"C:\MyDocuments","*.docx");foreach(stringdocumentindocuments){using(Parserparser=newParser(document)){varmetadata=parser.GetMetadata();varauthor=metadata.FirstOrDefault(m=>m.Name=="author");if(author?.Value?.ToString()=="John Smith"){Console.WriteLine($"Found document by John Smith: {Path.GetFileName(document)}");}}}
Example 2: Document Summary Report
Create a summary report showing key information about your Word documents:
Along with full featured .NET library we provide simple, but powerful free Apps.
You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.
On this page
Analyzing your prompt, please hold on...
An error occurred while retrieving the results. Please refresh the page and try again.