Extract highlights
GroupDocs.Parser provides the functionality to extract a highlight (a part of the text which is usually used to explain the context of the found text in the search functionality) from documents by the GetHighlight method:
HighlightItem GetHighlight(int position, bool isDirect, HighlightOptions options);
The position parameter defines the start position from which the highlight is extracted. The isDirect parameter indicates whether highlight extraction is direct: true if the highlight is extracted by the right of the position; otherwise, false. HighlightOptions parameter is used to define the end of the highlight.
HighlightOptions class has the following constructors:
// Highlight is limited to maxLength text length.
HighlightOptions(int maxLength);
// Highlight is limited to the start (or the end) of a text line (or maxLength text length - if set).
HighlightOptions(int? maxLength, bool isLineLimited);
// Highlight is limited to word count (or maxLength text length - if set).
HighlightOptions(int? maxLength, int wordCount);
// General constructor
HighlightOptions(int? maxLength, int? wordCount, bool isLineLimited);
HighlightItem class has the following members:
Member | Description |
---|---|
Position | The position in the document text. |
Text | The highlight text. |
Here are the steps to extract highlight from the document:
- Instantiate Parser object for the initial document;
- Instantiate HighlightOptions object with the extraction parameters;
- Call GetHighlight method and obtain the HighlightItem object;
- Check if HighlightItem isn’t null (highlight extraction is supported for the document);
- Call properties such as Position and Text.
The following example shows how to extract a highlight that contains 3 words:
// Create an instance of Parser class
using (Parser parser = new Parser(filePath))
{
// Extract a highlight:
HighlightItem hl = parser.GetHighlight(2, true, new HighlightOptions(3));
// Check if highlight extraction is supported
if (hl == null)
{
Console.WriteLine("Highlight extraction isn't supported");
return;
}
// Print an extracted highlight
Console.WriteLine(string.Format("At {0}: {1}", hl.Position, hl.Text));
}
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples:
Free online document parser App
Along with full featured .NET library we provide simple, but powerful free Apps.
You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.