To extract hyperlinks from Microsoft Office Word document GetStructure method is used. This method returns XML representation of the document. Hyperlinks are represented by “hyperlink” tag; “link” attribute contains hyperlink’s URL. For more details, see Extract text structure. Hyperlink can contain a text:

<hyperlink link="www.google.com">google.com</hyperlink>

Here are the steps to extract hyperlinks from Microsoft Office Word documents:

  • Instantiate Parser object for the initial document;
  • Call GetStructure method and obtain XmlReader object;
  • Iterate through the XML document.

The following example demonstrates how to extract hyperlinks from Microsoft Office Word document:

// Create an instance of Parser class
using (Parser parser = new Parser(filePath))
    // Get the reader object for the document XML representation
    using (XmlReader reader = parser.GetStructure())
        // Iterate over the document
        while (reader.Read())
            // If it is the start tag of the hyperlink
            if (reader.IsStartElement() && reader.Name == "hyperlink")
                // Print the link attribute

