GroupDocs.Search for .NET 17.09 Release Notes

Major Features

There are 10 features and enhancements in this regular monthly release. The most notable are:

SEARCHNET-1087 Add public constants with field names
SEARCHNET-1191 Remove obsolete properties from IndexingSettings
SEARCHNET-563 Implement functionality for storing document text in index
SEARCHNET-575 Add DocumentFilter property to IndexingSetting for filtering files
SEARCHNET-1150 Implement automatic encoding detection for text documents
SEARCHNET-1159 Implement support of CHM files
SEARCHNET-1161 Implement feature ‘Only best results range’ for fuzzy search
SEARCHNET-1162 Implement feature ‘Only best results range’ for spelling corrector
SEARCHNET-1196 Implement option for fuzzy search to consider transposition as a single mistake or not
SEARCHNET-1197 Implement option for spelling corrector to consider transposition as a single mistake or not

All Changes

KeySummaryCategory
SEARCHNET-1087Add public constants with field namesEnhancement
SEARCHNET-1191Remove obsolete properties from IndexingSettingsEnhancement
SEARCHNET-563Implement functionality for storing document text in indexNew Feature
SEARCHNET-575Add DocumentFilter property to IndexingSetting for filtering filesNew Feature
SEARCHNET-1150Implement automatic encoding detection for text documentsNew Feature
SEARCHNET-1159Implement support of CHM filesNew Feature
SEARCHNET-1161Implement feature ‘Only best results range’ for fuzzy searchNew Feature
SEARCHNET-1162Implement feature ‘Only best results range’ for spelling correctorNew Feature
SEARCHNET-1196Implement option for fuzzy search to consider transposition as a single mistake or notNew Feature
SEARCHNET-1197Implement option for spelling corrector to consider transposition as a single mistake or notNew Feature

Public API and Backward Incompatible Changes

Add public constants to field names

In this enhancement, field names have been added to public constants.

Public API Changes
Class DocumentTypes has been added to GroupDocs.Search namespace.
Class EpubFieldNames has been added to GroupDocs.Search namespace.
Class ExcelFieldNames has been added to GroupDocs.Search namespace.
Class FieldNames has been added to GroupDocs.Search namespace.
Class PresentationFieldNames has been added to GroupDocs.Search namespace.
Class WordFieldNames has been added to GroupDocs.Search namespace.

List of fields added to classes

Field Excel has been added to GroupDocs.Search.DocumentTypes class.
Field Pdf has been added to GroupDocs.Search.DocumentTypes class.
Field Presentation has been added to GroupDocs.Search.DocumentTypes class.
Field Word has been added to GroupDocs.Search.DocumentTypes class.
Field Txt has been added to GroupDocs.Search.DocumentTypes class.
Field OutlookStorage has been added to GroupDocs.Search.DocumentTypes class.
Field EmailMessage has been added to GroupDocs.Search.DocumentTypes class.
Field OneNote has been added to GroupDocs.Search.DocumentTypes class.
Field Epub has been added to GroupDocs.Search.DocumentTypes class.
Field FictionBook has been added to GroupDocs.Search.DocumentTypes class.
Field Zip has been added to GroupDocs.Search.DocumentTypes class.
Field Chm has been added to GroupDocs.Search.DocumentTypes class.

Field Title has been added to GroupDocs.Search.EpubFieldNames class.
Field Subject has been added to GroupDocs.Search.EpubFieldNames class.
Field Author has been added to GroupDocs.Search.EpubFieldNames class.
Field Description has been added to GroupDocs.Search.EpubFieldNames class.
Field Language has been added to GroupDocs.Search.EpubFieldNames class.
Field Copyrights has been added to GroupDocs.Search.EpubFieldNames class.
Field Publisher has been added to GroupDocs.Search.EpubFieldNames class.
Field PublishedDate has been added to GroupDocs.Search.EpubFieldNames class.

Field Application has been added to GroupDocs.Search.ExcelFieldNames class.
Field ApplicationVersion has been added to GroupDocs.Search.ExcelFieldNames class.
Field Title has been added to GroupDocs.Search.ExcelFieldNames class.
Field Subject has been added to GroupDocs.Search.ExcelFieldNames class.
Field Comments has been added to GroupDocs.Search.ExcelFieldNames class.
Field Keywords has been added to GroupDocs.Search.ExcelFieldNames class.
Field ContentStatus has been added to GroupDocs.Search.ExcelFieldNames class.
Field Category has been added to GroupDocs.Search.ExcelFieldNames class.
Field Manager has been added to GroupDocs.Search.ExcelFieldNames class.
Field Author has been added to GroupDocs.Search.ExcelFieldNames class.
Field LastAuthor has been added to GroupDocs.Search.ExcelFieldNames class.
Field Company has been added to GroupDocs.Search.ExcelFieldNames class.
Field HyperlinkBase has been added to GroupDocs.Search.ExcelFieldNames class.
Field CreatedTime has been added to GroupDocs.Search.ExcelFieldNames class.
Field LastSavedTime has been added to GroupDocs.Search.ExcelFieldNames class.
Field LastPrintedTime has been added to GroupDocs.Search.ExcelFieldNames class.

Field Title has been added to GroupDocs.Search.FictionBookFieldNames class.
Field Subject has been added to GroupDocs.Search.FictionBookFieldNames class.
Field Keywords has been added to GroupDocs.Search.FictionBookFieldNames class.
Field Author has been added to GroupDocs.Search.FictionBookFieldNames class.
Field Description has been added to GroupDocs.Search.FictionBookFieldNames class.
Field Language has been added to GroupDocs.Search.FictionBookFieldNames class.
Field Publisher has been added to GroupDocs.Search.FictionBookFieldNames class.
Field PublishedDate has been added to GroupDocs.Search.FictionBookFieldNames class.

Field Content has been added to GroupDocs.Search.FieldNames class.
Field FileName has been added to GroupDocs.Search.FieldNames class.
Field DocumentType has been added to GroupDocs.Search.FieldNames class.
Field CreationDate has been added to GroupDocs.Search.FieldNames class.
Field ModificationDate has been added to GroupDocs.Search.FieldNames class.

Field Application has been added to GroupDocs.Search.PresentationFieldNames class.
Field ApplicationVersion has been added to GroupDocs.Search.PresentationFieldNames class.
Field Title has been added to GroupDocs.Search.PresentationFieldNames class.
Field Subject has been added to GroupDocs.Search.PresentationFieldNames class.
Field Comments has been added to GroupDocs.Search.PresentationFieldNames class.
Field Keywords has been added to GroupDocs.Search.PresentationFieldNames class.
Field ContentStatus has been added to GroupDocs.Search.PresentationFieldNames class.
Field Category has been added to GroupDocs.Search.PresentationFieldNames class.
Field Manager has been added to GroupDocs.Search.PresentationFieldNames class.
Field Author has been added to GroupDocs.Search.PresentationFieldNames class.
Field LastAuthor has been added to GroupDocs.Search.PresentationFieldNames class.
Field Company has been added to GroupDocs.Search.PresentationFieldNames class.
Field HyperlinkBase has been added to GroupDocs.Search.PresentationFieldNames class.
Field CreatedTime has been added to GroupDocs.Search.PresentationFieldNames class.
Field LastSavedTime has been added to GroupDocs.Search.PresentationFieldNames class.
Field LastPrintedTime has been added to GroupDocs.Search.PresentationFieldNames class.
Field RevisionNumber has been added to GroupDocs.Search.PresentationFieldNames class.
Field TotalEditingTime has been added to GroupDocs.Search.PresentationFieldNames class.

Field Application has been added to GroupDocs.Search.WordFieldNames class.
Field ApplicationVersion has been added to GroupDocs.Search.WordFieldNames class.
Field Template has been added to GroupDocs.Search.WordFieldNames class.
Field Title has been added to GroupDocs.Search.WordFieldNames class.
Field Subject has been added to GroupDocs.Search.WordFieldNames class.
Field Comments has been added to GroupDocs.Search.WordFieldNames class.
Field Keywords has been added to GroupDocs.Search.WordFieldNames class.
Field ContentStatus has been added to GroupDocs.Search.WordFieldNames class.
Field Category has been added to GroupDocs.Search.WordFieldNames class.
Field Manager has been added to GroupDocs.Search.WordFieldNames class.
Field Author has been added to GroupDocs.Search.WordFieldNames class.
Field LastAuthor has been added to GroupDocs.Search.WordFieldNames class.
Field Company has been added to GroupDocs.Search.WordFieldNames class.
Field HyperlinkBase has been added to GroupDocs.Search.WordFieldNames class.
Field CreatedTime has been added to GroupDocs.Search.WordFieldNames class.
Field LastSavedTime has been added to GroupDocs.Search.WordFieldNames class.
Field LastPrintedTime has been added to GroupDocs.Search.WordFieldNames class.
Field RevisionNumber has been added to GroupDocs.Search.WordFieldNames class.
Field TotalEditingTime has been added to GroupDocs.Search.WordFieldNames class.

Usage:

C#

string indexFolder = @"c:\MyIndex";
string documentsFolder = @"c:\MyDocuments";

// creating index.
Index index = new Index(indexFolder);
index.AddToIndex(documentsFolder);

// searching using public constants as field names.
SearchResults results1 = index.Search(string.Format("{0}:{1}", FieldNames.Content, "query1"));
SearchResults results2 = index.Search(string.Format("{0}:{1}", ExcelFieldNames.Subject, "query2"));

Remove obsolete properties from IndexingSettings

Removed obsolete properties and constructors from indexing settings.

**Public API Changes
**Constructor IndexingSettings(bool quickIndexing) has been removed from GroupDocs.Search.IndexingSettings class.
Constructor IndexingSettings(bool quickIndexing, bool caseSensitive) has been removed from GroupDocs.Search.IndexingSettings class.
Property bool QuickIndexing has been removed from GroupDocs.Search.IndexingSettings class.
Property bool CaseSensitive has been removed from GroupDocs.Search.IndexingSettings class.

Implement functionality for storing document text in index

This feature allows for cache text of indexed documents in the index. The cached text is used to generate HTML markup by highlighting of search results.
Generating HTML markup from the cached text is faster than extracting text from source documents again, and can be performed even if source documents are no longer available.
The default value for TextStorageSettings property is null. This means that document texts will not be cached in the index.
TextStorageSettings class has a Compression parameter.
Compression.Normal value is used to cache text with a good balance of compression ratio and indexing speed.
Compression.None value is used to cache text at a maximum speed, but index size will be large.

Public API Changes
Class GroupDocs.Search.TextStorageSettings has been added to GroupDocs.Search namespace.
Property GroupDocs.Search.TextStorageSettings TextStorageSettings has been added to GroupDocs.Search.IndexingSettings class.
Enumeration Compression has been added to GroupDocs.Search namespace.
Value None has been added to GroupDocs.Search.Compression enumeration.
Value Normal has been added to GroupDocs.Search.Compression enumeration.

This example shows how to cache text of indexed documents in the index:

C#

string indexFolder = @"c:\MyIndex";
string documentsFolder = @"c:\MyDocuments";

// Creating indexing settings object
IndexingSettings settings = new IndexingSettings();
// Enabling source document text caching with normal compression level
settings.TextStorageSettings = new TextStorageSettings(Compression.Normal);

// Creating index
Index index = new Index(indexFolder, settings);

// Indexing
index.AddToIndex(documentsFolder);

Add DocumentFilter property to IndexingSetting for filtering files

This feature allows filtering files during indexing.
Filtering can be performed by the following parameters:

  • file length;
  • creation date;
  • modification date;
  • extension;
  • file name using the regular expression.

Public API Changes
Abstract class GroupDocs.Search.DocumentFilter has been added.
Method DocumentFilter CreateCreationTimeLowerBound(System.DateTime) has been added to GroupDocs.Search.DocumentFilter class.
Method DocumentFilter CreateCreationTimeUpperBound(System.DateTime) has been added to GroupDocs.Search.DocumentFilter class.
Method DocumentFilter CreateCreationTimeRange(System.DateTime,System.DateTime) has been added to GroupDocs.Search.DocumentFilter class.
Method DocumentFilter CreateModificationTimeLowerBound(System.DateTime) has been added to GroupDocs.Search.DocumentFilter class.
Method DocumentFilter CreateModificationTimeUpperBound(System.DateTime) has been added to GroupDocs.Search.DocumentFilter class.
Method DocumentFilter CreateModificationTimeRange(System.DateTime,System.DateTime) has been added to GroupDocs.Search.DocumentFilter class.
Method DocumentFilter CreateFileNameRegularExpression(System.String,System.Text.RegularExpressions.RegexOptions) has been added to GroupDocs.Search.DocumentFilter class.
Method DocumentFilter CreateFileLengthLowerBound(System.Int64) has been added to GroupDocs.Search.DocumentFilter class.
Method DocumentFilter CreateFileLengthUpperBound(System.Int64) has been added to GroupDocs.Search.DocumentFilter class.
Method DocumentFilter CreateFileLengthRange(System.Int64,System.Int64) has been added to GroupDocs.Search.DocumentFilter class.
Method DocumentFilter CreateFileExtension(System.String[]) has been added to GroupDocs.Search.DocumentFilter class.
Method DocumentFilter CreateInverted(DocumentFilter) has been added to GroupDocs.Search.DocumentFilter class.
Method DocumentFilter CreateConjunction(DocumentFilter[]) has been added to GroupDocs.Search.DocumentFilter class.
Method DocumentFilter CreateDisjunction(DocumentFilter[]) has been added to GroupDocs.Search.DocumentFilter class.
Property GroupDocs.Search.DocumentFilter DocumentFilter has been added to GroupDocs.Search.IndexingSettings class.

This example shows how to use document filters:

C#

string indexFolder = @"c:\MyIndex";
string documentsFolder = @"c:\MyDocuments";

// Creating indexing settings object
IndexingSettings settings = new IndexingSettings();

// Creating filter that only passes files from 600 KB to 1 MB in length
DocumentFilter byLength = DocumentFilter.CreateFileLengthRange(614400, 1048576);

// Creating filter that only passes text files
DocumentFilter byExtension = DocumentFilter.CreateFileExtension(".txt");

// Creating composite filter that only passes text files from 600 KB to 1 MB in length
DocumentFilter compositeFilter = DocumentFilter.CreateConjunction(byLength, byExtension);

// Setting filter
settings.DocumentFilter = compositeFilter;

// Creating index
Index index = new Index(indexFolder, settings);

// Indexing
index.AddToIndex(documentsFolder);

Implement automatic encoding detection for text documents

This feature allows detecting automatically the encoding of each text file that is indexed.
By default, AutoDetectEncoding property is set to false.
The following encodings can be detected:

  • UTF32 LE
  • UTF32 BE
  • UTF16 LE
  • UTF16 BE
  • UTF8
  • UTF7
  • ANSI

Encoding can be detected by BOM or by the content of the file (if BOM is not presented).
If encoding is not detected than UTF8 is used by default.

Public API Changes
Property bool AutoDetectEncoding has been added to GroupDocs.Search.IndexingSettings class.
Method string DetectEncoding(Encoding defaultEncoding, bool detectByContent) has been added to GroupDocs.Search.Events.FileIndexingEventArgs class.

This example shows how to detect the encoding of each text file automatically:

C#

string indexFolder = @"c:\MyIndex";
string documentsFolder = @"c:\MyDocuments";

// Creating indexing settings object
IndexingSettings settings = new IndexingSettings();
// Enabling automatic encoding detection
settings.AutoDetectEncoding = true;

// Creating index
Index index = new Index(indexFolder, settings);

// Indexing
index.AddToIndex(documentsFolder);

This example shows how to detect encoding selectively for some text files:

C#

string indexFolder = @"c:\MyIndex";
string documentsFolder = @"c:\MyDocuments";

// Creating index
Index index = new Index(indexFolder);

// Creating default encoding that is used when encoding was not detected
Encoding defaultEncoding = Encoding.GetEncoding(Encodings.Windows_1252);

// Subscribing to FileIndexing event
index.FileIndexing += (sender, args) =>
{
    // Detecting encoding only for text files located in the 'DifferentEncodings' folder
    string fileName = args.DocumentFullName;
    if (fileName.EndsWith(".txt", true, CultureInfo.InvariantCulture) &&
        fileName.StartsWith(@"c:\MyDocuments\txt\DifferentEncodings\"))
    {
        args.DetectEncoding(defaultEncoding, true);
    }
};

// Indexing
index.AddToIndex(documentsFolder);

Implement support for CHM files

Implemented support for CHM format.

Public API Changes
Enum value Chm has been added to GroupDocs.Search.DocumentType enum.

This feature allows performing the fuzzy search by collecting the best results, as well as results with a larger number of mistakes in a given range.
For example, suppose that the search is performed for a maximum of 10 mistakes with a range of 2. If words with a minimum of 5 mistakes are found, then also words with 6 and 7 mistakes will be included in the final result.
The default value for the OnlyBestResultsRange property is 0. This means that by default there will only be words with a minimum number of mistakes in the results of the search.

Public API Changes
Property bool OnlyBestResultsRange has been added to GroupDocs.Search.FuzzySearchParameters class.

This example shows how to use OnlyBestResultsRange property:

C#

string indexFolder = @"c:\MyIndex";
string documentsFolder = @"c:\MyDocuments";

// Creating index
Index index = new Index(indexFolder);

// Indexing
index.AddToIndex(documentsFolder);

SearchParameters searchParameters = new SearchParameters();
// Enabling fuzzy search
searchParameters.FuzzySearch.Enabled = true;
// Setting maximum mistake count to 10
searchParameters.FuzzySearch.FuzzyAlgorithm = new TableDiscreteFunction(10);
// Enabling OnlyBestResults option
searchParameters.FuzzySearch.OnlyBestResults = true;
// Setting best results range to 2
searchParameters.FuzzySearch.OnlyBestResultsRange = 2;

// Searching
SearchResults searchResults = index.Search("aaaaa", searchParameters);
// If there is no 'aaaaa' word in the index then
// there will be found 'aaaax' - 1 mistake, 'aaaxx' - 2 mistakes, 'aaxxx' - 3 mistakes

Implement feature ‘Only best results range’ from spelling corrector

This feature allows performing spelling correction by collecting the best results, as well as results with a larger number of mistakes in a given range.
For example, suppose that the correction is performed for a maximum of 10 mistakes with a range of 2. If words with a minimum of 5 mistakes are found, then also words with 6 and 7 mistakes will be included in the final result.
The default value for the OnlyBestResultsRange property is 0. This means that by default there will only be words with a minimum number of mistakes in the results of the spelling correction.

Public API Changes
Property bool OnlyBestResultsRange has been added to GroupDocs.Search.SpellingCorrectorParameters class.

This example shows how to use OnlyBestResultsRange property:

C#

string indexFolder = @"c:\MyIndex";
string documentsFolder = @"c:\MyDocuments";

// Creating index
Index index = new Index(indexFolder);

// Indexing
index.AddToIndex(documentsFolder);

SearchParameters searchParameters = new SearchParameters();
// Enabling spelling correction
searchParameters.SpellingCorrector.Enabled = true;
// Setting maximum mistake count to 10
searchParameters.SpellingCorrector.MaxMistakeCount = 10;
// Enabling OnlyBestResults option
searchParameters.SpellingCorrector.OnlyBestResults = true;
// Setting best results range to 2
searchParameters.SpellingCorrector.OnlyBestResultsRange = 2;

// Searching
SearchResults searchResults = index.Search("aaaaa", searchParameters);
// If there is no 'aaaaa' word in the spelling corrector dictionary then
// there will be found 'aaaax' - 1 mistake, 'aaaxx' - 2 mistakes, 'aaxxx' - 3 mistakes
// if this last three words are presented both in the spelling corrector dictionary and in the index

Implement option for fuzzy search to consider transposition as a single mistake or not

This option for fuzzy search allows to consider transposition of two adjacent characters as a single mistake, when the option is enabled, or as two mistakes, when the option is disabled.
The default value for the ConsiderTranspositions property is true.

Public API Changes
Property bool ConsiderTranspositions has been added to GroupDocs.Search.FuzzySearchParameters class.

This example shows how to use ConsiderTranspositions option:

C#

string indexFolder = @"c:\MyIndex";
string documentsFolder = @"c:\MyDocuments";

// Creating index
Index index = new Index(indexFolder);

// Indexing
index.AddToIndex(documentsFolder);

SearchParameters searchParameters = new SearchParameters();
// Enabling fuzzy search
searchParameters.FuzzySearch.Enabled = true;
// Setting maximum mistake count to 1
searchParameters.FuzzySearch.FuzzyAlgorithm = new TableDiscreteFunction(1);
// Setting not to consider transposition as a single mistake
searchParameters.FuzzySearch.ConsiderTranspositions = false;

// Searching for word 'Mail'
SearchResults searchResults = index.Search("Mail", searchParameters);
// There will be found word 'mails' - 1 mistake, but will not be found word 'Mali' - 2 mistakes

Implement option for spelling corrector to consider transposition as a single mistake or not

This option for spelling corrector allows to consider transposition of two adjacent characters as a single mistake, when the option is enabled, or as two mistakes, when the option is disabled.
The default value for the ConsiderTranspositions property is true.

Public API Changes
Property bool ConsiderTranspositions has been added to GroupDocs.Search.SpellingCorrectorParameters class.

This example shows how to use ConsiderTranspositions option:

C#

string indexFolder = @"c:\MyIndex";
string documentsFolder = @"c:\MyDocuments";

// Creating index
Index index = new Index(indexFolder);

// Indexing
index.AddToIndex(documentsFolder);

SearchParameters searchParameters = new SearchParameters();
// Enabling spelling corrector
searchParameters.SpellingCorrector.Enabled = true;
// Setting maximum mistake count to 1
searchParameters.SpellingCorrector.MaxMistakeCount = 1;
// Setting not to consider transposition as a single mistake
searchParameters.SpellingCorrector.ConsiderTranspositions = false;

// Searching for word 'Mail'
SearchResults searchResults = index.Search("Mail", searchParameters);
// There will be found word 'mails' - 1 mistake, but will not be found word 'Mali' - 2 mistakes.
// Note that word 'mails' must be present both in the spelling corrector dictionary and in the index.