Extract highlights Leave feedback

GroupDocs.Parser provides the functionality to extract a highlight (a part of the text which is usually used to explain the context of the found text in the search functionality) from documents by the getHighlight(int, boolean, HighlightOptions) method:

HighlightItem getHighlight(int position, boolean isDirect, HighlightOptions options);

The position parameter defines the start position from which the highlight is extracted. The isDirect parameter indicates whether highlight extraction is direct: true if the highlight is extracted by the right of the position; otherwise, false. HighlightOptions parameter is used to define the end of the highlight.

HighlightOptions class has the following constructors:

// Highlight is limited to maxLength text length.
HighlightOptions(int maxLength);
// Highlight is limited to the start (or the end) of a text line (or maxLength text length - if set).
HighlightOptions(Integer maxLength, boolean isLineLimited);
// Highlight is limited to word count (or maxLength text length - if set).
HighlightOptions(Integer maxLength, int wordCount);
// General constructor
HighlightOptions(Integer maxLength, Integer wordCount, boolean isLineLimited);

HighlightItem class has the following members:

Member	Description
getPosition	The position in the document text.
getText	The highlight text.

Here are the steps to extract highlight from the document:

Instantiate Parser object for the initial document;
Instantiate HighlightOptions object with the extraction parameters;
Call getHighlight(int, boolean, HighlightOptions) method and obtain the HighlightItem object;
Check if HighlightItem isn’t null (highlight extraction is supported for the document);
Call properties such as getPosition and getText.

The following example shows how to extract a highlight that contains 3 words:

// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SamplePdf)) {
    // Extract a highlight:
    HighlightItem hl = parser.getHighlight(2, true, new HighlightOptions(3));
    // Check if highlight extraction is supported
    if (hl == null) {
        System.out.println("Highlight extraction isn't supported");
        return;
    }
    // Print an extracted highlight
    System.out.println(String.format("At %d: %s", hl.getPosition(), hl.getText()));
}

More resources

GitHub examples

You may easily run the code above and see the feature in action in our GitHub examples:

Free online document parser App

Along with full featured Java library we provide simple, but powerful free Apps.

You are welcome to extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.

We value your opinion. Your feedback will help us improve our documentation.

Extract highlights Leave feedback

More resources

GitHub examples

Free online document parser App

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!