This page contains descriptions of all character types. Character types differ in how characters of these types are indexed. For more information on managing the Alphabet, see the Alphabet page in the Managing dictionaries section.
Regular characters
Regular characters are separators and letters. During indexing, separators are used to separate words from each other, and words are formed from continuous sequences of letters.
The type of each particular character, can be specified in the Alphabet. By default, the following characters are defined as letters (Unicode numbers are given):
Digits: 0030-0039;
Latin capital letters: 0041-005A;
Low line: 005F;
Latin small letters: 0061-007A;
Latin letters: 00C0-00D6, 00D8-00F6, 00F8-00FF, 0100-017F, 0180-024F, 0250-02AF;
The other characters are defined as separators in the Alphabet.
The following example demonstrates how to specify only numbers, low line character, and English characters as letters in the Alphabet.
C#
stringindexFolder=@"c:\MyIndex\";stringdocumentFolder=@"c:\MyDocuments\";// Creating an index in the specified folderIndexindex=newIndex(indexFolder);// Configuring the alphabet// Setting the separator type for all characters in the alphabetindex.Dictionaries.Alphabet.Clear();// Creating a list of letter charactersList<char>list=newList<char>();for(chari=(char)0x0030;i<=0x0039;i++){list.Add(i);// Digits}for(chari=(char)0x0041;i<=0x005A;i++){list.Add(i);// Latin capital letters}list.Add((char)0x005F);// Low linefor(chari=(char)0x0061;i<=0x007A;i++){list.Add(i);// Latin small letters}// Setting the type of characters in the alphabetchar[]characters=list.ToArray();index.Dictionaries.Alphabet.SetRange(characters,CharacterType.Letter);// Indexing documents from the specified folderindex.Add(documentFolder);// Searching in the indexSearchResultresult=index.Search("Einstein");
Blended characters
Blended characters are special characters that are indexed both as separators and as letters. This type of character can be useful, for example, for indexing hyphens. In this case, parts of a compound word containing a hyphen will be indexed both as a single word with a hyphen and individually without a hyphen. An example of using blended characters is presented below.
C#
stringindexFolder=@"c:\MyIndex\";stringdocumentFolder=@"c:\MyDocuments\";// Creating an index in the specified folderIndexindex=newIndex(indexFolder);// Setting hyphen character type to blendedindex.Dictionaries.Alphabet.SetRange(newchar[]{'-'},CharacterType.Blended);// Indexing documents from the specified folderindex.Add(documentFolder);// Searching in the indexSearchResultresult1=index.Search("Elliot-Murray-Kynynmound");SearchResultresult2=index.Search("Elliot");SearchResultresult3=index.Search("Murray");SearchResultresult4=index.Search("Kynynmound");
Indexing each character as a whole word
Another special type of character is character indexed as a separate word. This type of character is designed to work with hieroglyphic languages and allows you to index each character in the text as a separate word, regardless of the presence of separators.
More resources
GitHub examples
You may easily run the code from documentation articles and see the features in action in our GitHub examples: