Character replacement during indexing

Character replacement during indexing can be used, for example, to convert all text to lowercase characters or to remove diacritics from text. Such replacements can reduce the size of an index on disk if the case of characters or diacritics are not significant. See also Character replacements page in the Managing dictionaries section.

The example below demonstrates how to configure and use character replacements during indexing.

String indexFolder = "c:\\MyIndex\\";
String documentFolder = "c:\\MyDocuments\\";
// Enabling character replacements in the index settings
IndexSettings settings = new IndexSettings();
// Creating an index in the specified folder
Index index = new Index(indexFolder, settings);
// Configuring character replacements
// Deleting all existing character replacements from the dictionary
// Creating new character replacements
CharacterReplacementPair[] characterReplacements = new CharacterReplacementPair[Character.MAX_VALUE + 1];
for (int i = 0; i < characterReplacements.length; i++)
    char character = (char)i;
    char replacement = Character.toLowerCase(character);
    characterReplacements[i] = new CharacterReplacementPair(character, replacement);
// Adding character replacements to the dictionary
// Indexing documents from the specified folder
// Searching in the index
// Case-sensitive search is no longer possible for this index, since all characters are lowercase
// By default, case-insensitive search is performed
SearchResult result ="Einstein");

More resources

GitHub examples

You may easily run the code from documentation articles and see the features in action in our GitHub examples:

Free online document search App

Along with full featured .NET library we provide simple, but powerful free Apps.

You are welcome to search over your PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and more with our free online Free Online Document Search App.