Extract hyperlinks from document

GroupDocs.Parser provides the functionality to extract hyperlinks from documents by the getHyperlinks method:

Iterable<PageHyperlinkArea> getHyperlinks();

This method returns a collection of PageHyperlinkArea object:

MemberDescription
getPageThe page that contains the text area.
getRectangleThe rectangular area on the page that contains the text area.
getTextThe hyperlink text.
getUrlThe hyperlink URL.

Here are the steps to extract all hyperlinks from the whole document:

  • Instantiate Parser object for the initial document;
  • Check if the document supports hyperlink extraction;
  • Call getHyperlinks method and obtain collection of PageHyperlinkArea objects;
  • Iterate through the collection and get a hyperlink text and URL.

The following example shows how to extract all hyperlinks from the whole document:

// Create an instance of Parser class
try (Parser parser = new Parser(Constants.HyperlinksPdf)) {
    // Check if the document supports hyperlink extraction
    if (!parser.getFeatures().isHyperlinks()) {
        System.out.println("Document isn't supports hyperlink extraction.");
        return;
    }
    // Extract hyperlinks from the document
    Iterable<PageHyperlinkArea> hyperlinks = parser.getHyperlinks();
    // Iterate over hyperlinks
    for (PageHyperlinkArea h : hyperlinks) {
        // Print the hyperlink text
        System.out.println(h.getText());
        // Print the hyperlink URL
        System.out.println(h.getUrl());
        System.out.println();
    }
}

More resources

GitHub examples

You may easily run the code above and see the feature in action in our GitHub examples:

Free online image extractor App

Along with full featured .NET library we provide simple, but powerfull free APPs.

You are welcome to extract images from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online GroupDocs Parser App.