Skip to end of metadata
Go to start of metadata
 

Search Features

GroupDocs.Search for .NET includes the following features:

Search for Object Types

GroupDocs.Search supports searching for following object types.

  • Text Occurrences
  • Basic Metadata Fields
  • File Names
  • Document Types
  • Document Created/Modified Dates

Search Queries

Simple Queries

Process/search simple search queries.

Boolean Queries

You can perform Boolean Search operations and the following operators are supported.

Operator

Example

AND

term1 AND term2

NOT

term1 NOT term2

OR

term1 OR term2

()

(term1 AND term2) OR term3

Regular Expression Queries

GroupDocs.Search also supports regular expression search queries. A regular expression (sometimes called a rational expression) is a sequence of characters that define a search pattern (learn more). Regex term should be marked with the ^ symbol. Moreover, regex term can also be mixed with ordinary terms in the query.

Faceted Search Queries

Faceted Search allows searching for specific fields of the document such as document name, document type, creation date etc. In order to avail this feature Faceted Search field name should be specified in the search query. The field names are not case sensitive e:g

List of supported fields

Field NameDescriptionPst and OstDoc and DocxXls and XlsxPpt and PptxPdf TxtUsing example
ContentText content of document(tick)(tick)(tick)(tick)(tick)(tick)Content:Rider
FileNameFile Name(tick)(tick)(tick)(tick)(tick)(tick)FileName:Document1
DocumentTypeType of documentOutlookStorageWordExcelPresentationPdfTextDocumentType:Word
CreationDateDate of file creation(tick)(tick)(tick)(tick)(tick)(tick)CreationDate:2016
ModificationDateDate of file creation(tick)(tick)(tick)(tick)(tick)(tick)ModificationDate:(12 10 2016)
Author  (tick)(tick)(tick)(tick)  
Category  (tick)(tick)(tick)   
Comments  (tick)(tick)(tick)   
Company  (tick)(tick)(tick)   
ContentStatus  (tick)(tick)(tick)   
ContentType  (tick)(tick)(tick)   
Creator     (tick)  
HyperlinkBase  (tick)(tick)(tick)   
Keywords  (tick)(tick)(tick)(tick)  
LastSavedBy  (tick)(tick)(tick)   
NameOfApplication  (tick)(tick)(tick)   
Manager  (tick)(tick)(tick)   
Subject  (tick)(tick)(tick)(tick)  
Template  (tick)(tick)(tick)   
Title  (tick)(tick)(tick)(tick)  
Trapped     (tick)  
BytesCount  (tick)(tick)    
CharactersCount  (tick)(tick)    
CharactersWithSpacesCount  (tick)(tick)    
LastPrinted  (tick)(tick)(tick)   
LinesCount  (tick)(tick)    
PagesCount  (tick)(tick)    
ParagraphsCount  (tick)(tick)    
RevisionNumber  (tick)(tick)(tick)   
Producer     (tick)  
TotalEditingTime  (tick)(tick)(tick)   
Version  (tick)(tick)(tick)   
WordsCount  (tick)(tick)    
MailMessageBody (tick)      
MailSenderName (tick)      
MailDisplayName (tick)      
MailDisplayToS (tick)      
MailSubject (tick)      
MailDeliveryTime (tick)      
MailArrivalTime (tick)      
MailMessageFlags (tick)      

 Pdf document can have fields with different names and all that fields will be indexed.

Case Sensitive Search Queries

Case Sensitive Search allows you to perform a case-sensitive search and focus more precisely on a narrower result set. It does so by differentiating between capital and lowercase letters and returning only those results that match the case of the search string/query e-g searching for "Foo" cannot return "foo" and vice versa.

Advanced Search

Fuzzy Search

Fuzzy search uses a compelling technique to find search results. It shows meaningful results from a misspelt search string/query. The advantageous thing about fuzzy search is, it does not concern with index. Hence, fuzziness can vary at the time of each search.

Synonym Search

Synonym search introduces search for closely associated words. It allows the user to find relevant results for a search term. Let's say against a search term "small", user can find results like "little", "tiny" or other synonyms.

Date Range Search

Date Range Search allows searching for a date in the specified range. It can be combined with other types of search. For example: "daterange(1.1.2015~~12.31.2018)" when used in a search query returns all dates between 1 Jan 2015 and 31 Dec 2018.

Numeric Range Search

Numeric Range Search allows searching a certain range of numbers within the index. It can be combined with other types of search.

Password Protected Documents Search

API also allows searching in password protected documents.

Search using Morphological Word Forms

API allows you to search for different word forms. For example, you can search for singular and plural forms of a noun at the same time. 

Spelling Corrector

API allows correction of misspelt words in a query before performing search operation.

Keyboard Layout Corrector

API allows recognizing search queries written in a language that does not match the keyboard layout.

Exact Phrase Search

API uses exact phrase query specifier to specify the phrase for phrase search.

Specify Number of Searching Thread

API allows specifying the number of searching threads.

Cancel Search Operation

API allows cancelling search operation manually

Searching by Parts

API allows running searching by parts. In huge indexes for terabytes of documents, the search takes a lot of execution time. Searching by parts or chunks makes it possible to get part of results much faster.

Get Searching Report

API allows making a report with detailed information about searching.

Highlight Results in Text

API allows generating text formatted with a minimum number of HTML tags. HTML tags are used to insert line breaks, highlight found terms in the text, and navigate on found terms in the web browser.

Other Features

Besides the above-mentioned features, the API also supports the following features related to search functionality:

  • Get total hit counts for a search query 
  • Limit the number of search results
  • Get matched words in the found documents
  • Warn user in case of not supported settings
  • Support different search features in a single search query
  • Define table discrete function as a step function
  • Save encodings automatically

Indexing Features

Creation of the index is the most frequent case. An index is created and then the search is performed on that index. Following are the indexing features.

Create Index

Indices are created to collect, parse or store data. It helps in fast and accurate searching. An index folder is always created and then documents are added or indexed towards this folder.

Update Index

Whenever a document is edited, added or deleted in the documents folder, it is required to update the Index. A search is being performed on Index. Hence, updating the Index will help to acquire results from updated files as well. Index updating is supported by the API.

Load Index

The index can be loaded if it was already created.

Add Documents to Index

Once an index is created, documents are added to it. This can also be done asynchronously.

Other Features

Apart from the above-mentioned indexing features, GroupDocs.Search for .NET also supports the following indexing features:

  • Index metadata of documents
  • Merge indexes
  • Compact Indexing 
  • Multithreaded Indexing
  • Index separate files
  • Track all changes to file in index folder
  • View progress percentage of indexing or updating

  • Prevent unnecessary file indexing
  • Subscribe to events

  • Extract the list of indexed documents

  • Extract document text

  • Break IndexRepository manually 

  • Break index updating manually

  • Break index merging manually

  • Break indexing manually

  • Store document text in the index

  • Detect encoding automatically

  • Accent-insensitive indexing

Implementing Dictionaries

GroupDocs.Search for .NET implements the following dictionaries.

Alias Dictionary

Alias dictionary features allow using abbreviations for frequent long queries. Abbreviation (alias) must start with @ character in a query.

Homophone Dictionary

This feature allows to manage the list of heterographic homophones and use it to improve search results.

Letters Dictionary

This feature allows managing list of searchable letters. Letters that are not in the dictionary are considered as separators.

Stop Words Dictionary

Stop words are words which are filtered and are not indexed. For example words: a, an, the, for, in, is, it, was, were and so on. Users can change the list of stop words before indexing. Users can also disable using stop words, but this can increase the time of indexing. All indexes contain default Stop Word Dictionary with the most common stop words.

Searching in Several Indices

API allows the user to have more than one index, and perform search simultaneously.

Index Email Messages

API allows to index email messages from Microsoft Outlook files (PST, OST, MSG, EML, and EMLX). All search results found in OST, PST, MSG, EML, and EMLX files are stored in OutlookEmailMessageResultInfo.

Metered Licensing

The GroupDocs.Search.Metered public class has been added to provide metered licensing. This feature is supported since version 17.06.

Labels
  • No labels