Extract hyperlinks from Microsoft Office Word documents
Extract hyperlinks from Microsoft Office Word documents
Leave feedback
To extract hyperlinks from Microsoft Office Word document GetStructure method is used. This method returns XML representation of the document. Hyperlinks are represented by “hyperlink” tag; “link” attribute contains hyperlink’s URL. For more details, see Extract text structure. Hyperlink can contain a text:
GetStructure method returns null value if text structure extraction isn’t supported for the document. For example, text structure extraction isn’t supported for TXT files. Therefore, for TXT file GetStructure method returns null. If Microsoft Office Word document has no text, GetStructuremethod returns an empty XmlReader object.
Here are the steps to extract hyperlinks from Microsoft Office Word documents:
Instantiate Parser object for the initial document;
The following example demonstrates how to extract hyperlinks from Microsoft Office Word document:
// Create an instance of Parser classusing(Parserparser=newParser(filePath)){// Get the reader object for the document XML representationusing(XmlReaderreader=parser.GetStructure()){// Iterate over the documentwhile(reader.Read()){// If it is the start tag of the hyperlinkif(reader.IsStartElement()&&reader.Name=="hyperlink"){// Print the link attributeConsole.WriteLine(reader.GetAttribute("link"));}}}}
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples:
Along with full featured .NET library we provide simple, but powerful free Apps.
You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.