GroupDocs.Search for .NET 22.11 Release Notes

Major Features

There are the following features, enhancements, and fixes in this release:

  • Implement text extraction for each field separately
  • Implement text extraction in plain text format

Full List of Issues Covering all Changes in this Release

SEARCHNET-2759Implement text extraction for each field separatelyEnhancement
SEARCHNET-2783Implement text extraction in plain text formatEnhancement

Public API and Backward Incompatible Changes

Implement text extraction for each field separately

This enhancement adds a new StructureOutputAdapter class that collects the extracted text separately for each document field. For more information on output adapters, see the Output adapters documentation page.

Public API changes

Class StructureOutputAdapter has been added to GroupDocs.Search.Common namespace.
Method GroupDocs.Search.Common.DocumentField[] GetResult() has been added to GroupDocs.Search.Common.StructureOutputAdapter class.
Constructor StructureOutputAdapter(GroupDocs.Search.Options.OutputFormat) has been added to GroupDocs.Search.Common.StructureOutputAdapter class.

Use cases

The following example demonstrates how to output the extractracted text with different output adapters.

string indexFolder = @"c:\MyIndex\";
string documentsFolder = @"c:\MyDocuments\";

// Creating an index settings instance
IndexSettings settings = new IndexSettings();
settings.TextStorageSettings = new TextStorageSettings(Compression.High); // Enabling the storage of extracted text in the index

// Creating an index in the specified folder
Index index = new Index(indexFolder, settings);

// Indexing documents from the specified folder

// Getting list of indexed documents
DocumentInfo[] documents = index.GetIndexedDocuments();

// Getting a document text
if (documents.Length > 0)
    DocumentInfo document = documents[0];

    // Output to a file
    FileOutputAdapter fileOutputAdapter = new FileOutputAdapter(OutputFormat.Html, @"C:\Text.html");
    index.GetDocumentText(document, fileOutputAdapter);

    // Output to a stream
    using (Stream stream = new MemoryStream())
        StreamOutputAdapter streamOutputAdapter = new StreamOutputAdapter(OutputFormat.Html, stream);
        index.GetDocumentText(document, streamOutputAdapter);

    // Output to a string
    StringOutputAdapter stringOutputAdapter = new StringOutputAdapter(OutputFormat.Html);
    index.GetDocumentText(document, stringOutputAdapter);
    string htmlText = stringOutputAdapter.GetResult();

    // Output to a structure
    StructureOutputAdapter structureOutputAdapter = new StructureOutputAdapter(OutputFormat.PlainText);
    index.GetDocumentText(document, structureOutputAdapter);
    DocumentField[] fields = structureOutputAdapter.GetResult();
    for (int i = 0; i < fields.Length; i++)
        DocumentField field = fields[i];
        Console.WriteLine("\t" + field.Name);

Implement text extraction in plain text format

This enhancement implements the ability to output the text of the document in plain text format, both when obtaining the text of the document from the index and when highlighting the found words in the text. For more information, see the Getting indexed documents documentation page and the Highlighting search results documentation page.

Public API changes

Enum OutputFormat has been added to GroupDocs.Search.Options namespace.
Field GroupDocs.Search.Options.OutputFormat Html has been added to GroupDocs.Search.Options.OutputFormat enum.
Field GroupDocs.Search.Options.OutputFormat PlainText has been added to GroupDocs.Search.Options.OutputFormat enum.

Class DocumentHighlighter has been added to GroupDocs.Search.Highlighters namespace.
Constructor DocumentHighlighter(GroupDocs.Search.Common.OutputAdapter) has been added to GroupDocs.Search.Highlighters.DocumentHighlighter class.
Property GroupDocs.Search.Common.OutputAdapter OutputAdapter has been added to GroupDocs.Search.Highlighters.DocumentHighlighter class.

Class FragmentHighlighter has been added to GroupDocs.Search.Highlighters namespace.
Constructor FragmentHighlighter(GroupDocs.Search.Options.OutputFormat) has been added to GroupDocs.Search.Highlighters.FragmentHighlighter class.
Method GroupDocs.Search.Common.FragmentContainer[] GetResult() has been added to GroupDocs.Search.Highlighters.FragmentHighlighter class.

Property System.String TermHighlightEndTag has been added to GroupDocs.Search.Options.HighlightOptions class.
Property System.String TermHighlightStartTag has been added to GroupDocs.Search.Options.HighlightOptions class.

Property GroupDocs.Search.Options.OutputFormat OutputFormat has been added to GroupDocs.Search.Common.ResultBuilderFactory class.

Constructor FileOutputAdapter(GroupDocs.Search.Options.OutputFormat, System.String) has been added to GroupDocs.Search.Common.FileOutputAdapter class.
Constructor StreamOutputAdapter(GroupDocs.Search.Options.OutputFormat, System.IO.Stream) has been added to GroupDocs.Search.Common.StreamOutputAdapter class.
Constructor StringOutputAdapter(GroupDocs.Search.Options.OutputFormat) has been added to GroupDocs.Search.Common.StringOutputAdapter class.

Use cases

The following example demonstrates how to highlight search results in plain text format.

string indexFolder = @"c:\MyIndex\";
string documentFolder = @"c:\MyDocuments\";

// Creating an index
Index index = new Index(indexFolder);

// Indexing documents from the specified folder

// Search for the word 'Universe'
SearchResult result = index.Search("Universe");

// Highlighting occurrences in the text
if (result.DocumentCount > 0)
    FoundDocument document = result.GetFoundDocument(0); // Getting the first found document
    StructureOutputAdapter outputAdapter = new StructureOutputAdapter(OutputFormat.PlainText); // Creating the output adapter
    Highlighter highlighter = new DocumentHighlighter(outputAdapter); // Creating the highlighter instance
    HighlightOptions options = new HighlightOptions(); // Creating the highlight options
    options.TermHighlightStartTag = "<Term>"; // Setting the start tag for the found word
    options.TermHighlightEndTag = "</Term>"; // Setting the end tag for the found word
    index.Highlight(document, highlighter, options); // Generating plain text with highlighted occurrences

    DocumentField[] fields = outputAdapter.GetResult();
    for (int i = 0; i < fields.Length; i++)
        // Printing field names of the found document
        DocumentField field = fields[i];
        Console.WriteLine("\t" + field.Name);