GroupDocs.Parser provides the functionality to extract a highlight (a part of the text which is usually used to explain the context of the found text in the search functionality) from documents by the getHighlight(int, boolean, HighlightOptions) method:
The position parameter defines the start position from which the highlight is extracted. The isDirect parameter indicates whether highlight extraction is direct: true if the highlight is extracted by the right of the position; otherwise, false. HighlightOptions parameter is used to define the end of the highlight.
// Highlight is limited to maxLength text length.
HighlightOptions(intmaxLength);// Highlight is limited to the start (or the end) of a text line (or maxLength text length - if set).
HighlightOptions(IntegermaxLength,booleanisLineLimited);// Highlight is limited to word count (or maxLength text length - if set).
HighlightOptions(IntegermaxLength,intwordCount);// General constructor
HighlightOptions(IntegermaxLength,IntegerwordCount,booleanisLineLimited);
The following example shows how to extract a highlight that contains 3 words:
// Create an instance of Parser class
try(Parserparser=newParser(Constants.SamplePdf)){// Extract a highlight:
HighlightItemhl=parser.getHighlight(2,true,newHighlightOptions(3));// Check if highlight extraction is supported
if(hl==null){System.out.println("Highlight extraction isn't supported");return;}// Print an extracted highlight
System.out.println(String.format("At %d: %s",hl.getPosition(),hl.getText()));}
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples: