GroupDocs.Search for .NET 19.10 Release Notes

Major Features

Other notable features and improvements:

  • Implement highlighting search results in short fragments
  • Enhance document metadata indexing with new formats
  • Implement indexing each letter as a separate word
  • Implement ability to remove paths from index

Full List of Issues Covering all Changes in this Release

KeySummaryCategory
SEARCHNET-1967Implement highlighting search results in short fragmentsImprovement
SEARCHNET-1970Enhance document metadata indexing with new formatsImprovement
SEARCHNET-2110Implement new public APIImprovement
SEARCHNET-2035Implement indexing each letter as a separate wordNew Feature
SEARCHNET-2108Implement ability to remove paths from indexNew Feature

Public API and Backward Incompatible Changes

Implement highlighting search results in short fragments

This improvement allows highlighting the search results in separate short fragments of the text, and not in the whole document. A detailed description of the feature is presented in the documentation on the Highlighting search results page.

Usecases

This example shows how to generate short HTML snippets with highlighted found terms:

C#

string indexFolder = @"c:\MyIndex";
string documentFolder = @"c:\MyDocuments";
 
// Creating index
Index index = new Index(indexFolder);
 
// Adding documents to index
index.Add(documentFolder);
 
// Searching
SearchResult result = index.Search("hobbit");
 
// Highlighting found terms in short HTML snippets
if (result.DocumentCount > 0)
{
    FoundDocument document = result.GetFoundDocument(0);
    HtmlFragmentHighlighter highlighter = new HtmlFragmentHighlighter();
    index.Highlight(document, highlighter);
 
    // Getting the result
    FragmentContainer[] fragmentContainers = highlighter.GetResult();
    for (int i = 0; i < fragmentContainers.Length; i++)
    {
        FragmentContainer container = fragmentContainers[i];
        string[] fragments = container.GetFragments();
        if (fragments.Length > 0)
        {
            Console.WriteLine(container.FieldName);
            Console.WriteLine();
            for (int j = 0; j < fragments.Length; j++)
            {
                // Printing HTML markup to console
                Console.WriteLine(fragments[j]);
                Console.WriteLine();
            }
        }
    }
} 

Enhance document metadata indexing with new formats

This improvement adds support for new document formats. These are mostly documents, the main content of which is not textual, therefore only the metadata of these documents is indexed:

  • MP3 – MPEG-2 Audio Layer III;
  • WAV – Waveform Audio File Format;
  • BMP – Bitmap Picture;
  • GIF – Graphical Interchange Format File;
  • JP2 – JPEG 2000 Core Image File;
  • PNG – Portable Network Graphics;
  • WEBP – WebP Image Format File;
  • TIFF – Tagged Image File Format;
  • EMF – Enhanced Windows Metafile;
  • WMF – Windows Metafile;
  • JPG – JPEG Image;
  • PSD – Adobe Photoshop Document;
  • DJVU – DjVu Image;
  • MPP – Microsoft Project File;
  • TORRENT – BitTorrent File;
  • VSD – Visio Drawing File;
  • VSS – Visio Stencil File;
  • DCM – DICOM Image;
  • AVI – Audio Video Interleave File;
  • MOV – Apple QuickTime Movie;
  • QT – Apple QuickTime Movie;
  • FLV – Animate Video File;
  • ASF – Advanced Systems Format File.

A complete list of supported formats is provided on the Supported Document Formats page.

Implement new public API

Implemented a new convenient intuitive public API. Full documentation for the new API is presented here.

All public types from the legacy GroupDocs.Search namespace have been moved to the GroupDocs.Search.Legacy namespace and marked obsolete with the message: “This interface / class / enumeration is deprecated and will be available until January 2020 (version 20.1).”

Implement indexing each letter as a separate word

This feature is designed to work with hieroglyphic languages and allows you to index each character in the text as a separate word, regardless of the presence of separators.

Usecases

The example shows how to perform indexing and search for Chinese characters:

C#

string indexFolder = @"c:\MyIndex";
string documentFolder = @"c:\MyDocuments";
 
// Creating index
Index index = new Index(indexFolder);
 
// Setting blended character type for Chinese characters
HashSet<char> hashSet = new HashSet<char>();
for (char character = (char)0x4E00; character <= 0x9FFF; character++) // Common
{
    hashSet.Add(character);
}
for (char character = (char)0x3400; character <= 0x4DBF; character++) // Rare
{
    hashSet.Add(character);
}
char[] characters = new char[hashSet.Count];
hashSet.CopyTo(characters);
index.Dictionaries.Alphabet.SetRange(characters, CharacterType.SeparateWord); // Setting character type
 
// Adding documents to index
index.Add(documentFolder);
 
// Searching for the Unicode character U+4E50
SearchResult result = index.Search("\u4E50");

Implement ability to remove paths from index

This feature allows you to remove from an index paths added for indexing. When indexed paths are removed from an index, the index is updated and all removed documents and folders become inaccessible for search. Detailed information about this feature is presented on the Delete indexed paths page.

Usecases

The example shows how to remove indexed paths from an index:

C#

string indexFolder = @"c:\MyIndex\";
string documentsFolder1 = @"c:\MyDocuments\";
string documentsFolder2 = @"c:\MyDocuments2\";
 
// Creating an index in the specified folder
Index index = new Index(indexFolder);
 
// Indexing documents from the specified folders
index.Add(documentsFolder1);
index.Add(documentsFolder2);
 
// Getting indexed paths from the index
string[] indexedPaths1 = index.GetIndexedPaths();
 
// Writing indexed paths to the console
Console.WriteLine("Indexed paths:");
foreach (string path in indexedPaths1)
{
    Console.WriteLine("\t" + path);
}
 
// Deleting index path from the index
DeleteResult deleteResult = index.Delete(new string[] { documentsFolder1 }, new UpdateOptions());
 
// Getting indexed paths after deletion
string[] indexedPaths2 = index.GetIndexedPaths();
Console.WriteLine("\nDeleted paths: " + deleteResult.SuccessCount);
 
Console.WriteLine("\nIndexed paths:");
foreach (string path in indexedPaths2)
{
    Console.WriteLine("\t" + path);
}