Extract highlights

GroupDocs.Parser provides the functionality to extract a highlight (a part of the text which is usually used to explain the context of the found text in the search functionality) from documents by the getHighlight(int, boolean, HighlightOptions) method:

HighlightItem getHighlight(int position, boolean isDirect, HighlightOptions options);

The position parameter defines the start position from which the highlight is extracted. The isDirect parameter indicates whether highlight extraction is direct: true if the highlight is extracted by the right of the position; otherwise, false. HighlightOptions parameter is used to define the end of the highlight.

HighlightOptions class has the following constructors:

// Highlight is limited to maxLength text length.
HighlightOptions(int maxLength);
// Highlight is limited to the start (or the end) of a text line (or maxLength text length - if set).
HighlightOptions(Integer maxLength, boolean isLineLimited);
// Highlight is limited to word count (or maxLength text length - if set).
HighlightOptions(Integer maxLength, int wordCount);
// General constructor
HighlightOptions(Integer maxLength, Integer wordCount, boolean isLineLimited);

HighlightItem class has the following members:

MemberDescription
getPositionThe position in the document text.
getTextThe highlight text.

Here are the steps to extract highlight from the document:

The following example shows how to extract a highlight that contains 3 words:

// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SamplePdf)) {
    // Extract a highlight:
    HighlightItem hl = parser.getHighlight(2, true, new HighlightOptions(3));
    // Check if highlight extraction is supported
    if (hl == null) {
        System.out.println("Highlight extraction isn't supported");
        return;
    }
    // Print an extracted highlight
    System.out.println(String.format("At %d: %s", hl.getPosition(), hl.getText()));
}

More resources

GitHub examples

You may easily run the code above and see the feature in action in our GitHub examples:

Free online document parser App

Along with full featured Java library we provide simple, but powerful free Apps.

You are welcome to extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.