GroupDocs.Search for .NET 17.12 Release Notes

Major Features

There are 4 enhancements in this regular monthly release. The most notable are:

  • Improved representation of results of exact phrase search
  • Improved calculation of relevance of search results
  • Improved search query syntax
  • Implementation of highlighted results for exact phrase search in text

All Changes

KeySummaryCategory
SEARCHNET-1294Improve representation of results of exact phrase searchEnhancement
SEARCHNET-1295Improve calculation of relevance of search resultsEnhancement
SEARCHNET-1296Improve search query syntaxEnhancement
SEARCHNET-1275Implement highlighting of results of exact phrase search in textEnhancement 

Public API and Backward Incompatible Changes

Description

This enhancement is implemented to store the results of exact phrase search in a more structured and convenient form.

Public API changes

Property System.String[][] TermSequences has been added to GroupDocs.Search.DocumentResultInfo class.
Property System.String[][] TermSequences has been added to GroupDocs.Search.DetailedResultInfo class.

Usecases

This example shows how to perform the exact phrase search and get results:

string indexFolder = @"c:\MyIndex";
string documentsFolder = @"c:\MyDocuments";

// Creating index
Index index = new Index(indexFolder);

// Indexing
index.AddToIndex(documentsFolder);

// Creating search parameters object
SearchParameters searchParameters = new SearchParameters();
// Enabling fuzzy search
searchParameters.FuzzySearch.Enabled = true;
// Setting maximum mistake count to 1
searchParameters.FuzzySearch.FuzzyAlgorithm = new TableDiscreteFunction(1);

// Searching for phrase 'cumulative distribution function' or phrase 'cumulative density function'
SearchResults searchResults = index.Search("\"cumulative distribution function\" OR \"cumulative density function\"", searchParameters);

// Displaying results
foreach (DocumentResultInfo document in searchResults)
{
    Console.WriteLine(document.FileName);
    foreach (DetailedResultInfo field in document.DetailedResults)
    {
        Console.WriteLine(field.FieldName);
        foreach (string[] phrase in field.TermSequences)
        {
            Console.Write("\t");
            foreach (string word in phrase)
            {
                Console.Write(word + " ");
            }
            Console.WriteLine();
        }
    }
}

// The results may contain the following phrases:
// cumulative distribution function
// cumulative distribution functions
// cumulative density function
// cumulative density functions

Improved calculation of relevance of search results

Description

In this enhancement, the way of calculating the relevance of search results has been changed. Now the following formula is used to calculate the relevance of each found document:
R = O / T,
where R is relevance; O is occurrence count in the document; T is total word count in document.

Public API changes

None.

Usecases

This example shows how to perform search and sort results by relevance:

 // Implementing comparer
public class Comparer : IComparer<DocumentResultInfo>
{
    public int Compare(DocumentResultInfo x, DocumentResultInfo y)
    {
        // Compare y and x in reverse order
        return y.Relevance.CompareTo(x.Relevance);
    }
}

...

string indexFolder = @"c:\MyIndex";
string documentsFolder = @"c:\MyDocuments";

// Creating index
Index index = new Index(indexFolder);

// Indexing
index.AddToIndex(documentsFolder);

// Creating search parameters object
SearchParameters searchParameters = new SearchParameters();
// Enabling fuzzy search
searchParameters.FuzzySearch.Enabled = true;
// Setting maximum mistake count to 1
searchParameters.FuzzySearch.FuzzyAlgorithm = new TableDiscreteFunction(1);

// Searching for term 'database'
// Using fuzzy search allows to find the plural form of the term 'databases'
SearchResults searchResults = index.Search("database", searchParameters);

// Creating and filling array for sorting
DocumentResultInfo[] array = new DocumentResultInfo[searchResults.Count];
for (int i = 0; i < array.Length; i++)
{
    array[i] = searchResults[i];
}

// Sorting results in array by relevance in descending order
Array.Sort(array, new Comparer());

Implementation of highlighted results for exact phrase search in text

Description

In this enhancement, generating of HTML-formatted text with highlighted found terms is implemented for the results of exact phrase search.

Public API changes

None.

Usecases

This example shows how to generate text with highlighted results of exact phrase search:

 string indexFolder = @"c:\MyIndex";
string documentsFolder = @"c:\MyDocuments";

// Creating index
Index index = new Index(indexFolder);

// Indexing documents
index.AddToIndex(documentsFolder);

// Searching for phrase 'cumulative distribution function'
SearchResults results = index.Search("\"cumulative distribution function\"");

// Generating HTML-formatted text for the first document directly to the file 'HighlightedResults.html'
index.HighlightInText("HighlightedResults.html", results[0]);

Improved search query syntax

In this enhancement, the syntax of the search query language has been redesigned to make it more understandable and conventional.

Search operations (since v.17.12)

OperationSyntaxDescriptionExamples
Parenthesesinner-query )Parentheses are used to specify order of operations.
  • term1 I (term2 & term3) 
  • (“total expenses” | “total costs”) & (82000 ~~ 83000 | 92000 ~~ 93000)

| | Field specifier | field-name : inner-query | Field specifier is used to specify field name. |

  • content : term 
  • creationdate : (2010 ~~ 2013) 
  • filename : report & creationdate: 2009

| | Exact phrase query specifier | " exact-phrase-query " | Exact phrase query specifier is used to specify phrase for phrase search. |

  • “term1 term2 term3” 
  • “computational complexity theory” 
  • “formal language” AND harrison

| | And operation | left-query & right-query 
left-query AND right-query | And operation is used to find documents which contain both left query and right query. |

  • term1 & term2 
  • term1 AND term2 
  • computational & complexity

| | Or operation | left-query | right-query 
left-query || right-query 
left-query OR right-query | Or operation is used to find documents which contain left query, or right query, or both. |

  • term1 | term2 
  • term1 || term2 
  • term1 OR term2
  • “cumulative distribution function” OR “cumulative density function”

| | Not operation | ! inner-query 
NOT inner-query | Not operation is used to find all documents which do not contain inner query. |

  • ! term 
  • NOT term 
  • author : (Cardano AND NOT Gerolamo)

| | Macro name specifier | @macro-name | Macro name specifier is used to specify name of macro within search query that will be replaces with the body of the macro before parsing the query. |

  • @query_macro 
  • @macro1 & @macro2

| | Regular expression specifier | ^regular-expression | Regular expression specifier is used to specify query that is regular expression. |

  • ^^[0-9]{1,5}$

| | Numeric range specifier | start-number ~~ end-number | Numeric range specifier is used to specify range for numeric range search. |

  • 13 ~~ 42 
  • 10000000000 ~~ 100000000000

| | Date range specifier | daterange( start-date ~~ *end-date *) | Date range specifier is used to specify range for date range search. |

  • daterange(09.28.2017~~11.11.2017)

|

Search flow (since v.17.12)

OperationSearch flow
Simple term search (case insensitive)Keyboard layout correction 
Spelling correction 
Homophone search 
Synonym search 
Fuzzy search 
Retrieving results
Simple term search (case sensitive)Retrieving results
Date range searchRetrieving results
Numeric range searchRetrieving results
Exact phrase searchRetrieving results for each term of the phrase 
Joining sets of results
Regex searchRegex search 
Fuzzy search 
Retrieving results
And, OrRetrieving results for each operand
Combining sets of results
NotRetrieving results for operand
Inverting set of results

Query language specification (since v.17.12)

query:

  • regex-query
  • non-regex-query

regex-query:

  • ^pattern

non-regex-query:

  • unary-query
  • binary-query

unary-query:

  • word
  • exact-query
  • field-name-query
  • numeric-range-query
  • date-range-query
  • parenthesized-query
  • not-query

exact-query:

  • word-list "

word-list:

  • word word
  • word-list word

field-name-query:

  • field-nameunary-query

numeric-range-query:

  • number ~~ number

date-range-query:

  • daterange( date ~~ date )

parenthesized-query:

  • non-regex-query )

not-query:

  • unary-query
  • NOT unary-query

binary-query:

  • and-query
  • or-query

and-query:

  • non-regex-query & unary-query
  • non-regex-query AND unary-query

or-query:

  • non-regex-query | unary-query
  • non-regex-query || unary-query
  • non-regex-query OR unary-query