Stop words are frequently used words that do not carry a semantic meaning and can be removed from an index to reduce its size.
You can enable or disable the use of stop words by calling the setUseStopWords method of the IndexSettings class. The default value is true, meaning that stop words are filtered during indexing and not added to the index.
A list of stop words to use during indexing can be specified in the stop word dictionary. By default, the stop word dictionary is filled with the most widely used pronouns and prepositions of English and Russian. The list of stop words used can be easily replaced or supplemented and it is saved when the index is reloaded. For information on managing the stop word dictionary, see the Stop word dictionary page in the Managing dictionaries section.
If you need to keep all text information extracted from documents, and you are not afraid of a significant increase in the size of the index, then an example of indexing without stop words can be found below.
StringindexFolder="c:\\MyIndex\\";StringdocumentsFolder="c:\\MyDocuments\\";// Creating an index settings with disabled using of stop words
IndexSettingssettings=newIndexSettings();settings.setUseStopWords(false);// Creating an index in the specified folder
Indexindex=newIndex(indexFolder,settings);// Indexing documents from the specified folder
index.add(documentsFolder);// Searching in the index
// Now in the index it is possible to search for the stop word 'on'
SearchResultresult=index.search("on");
More resources
GitHub examples
You may easily run the code from documentation articles and see the features in action in our GitHub examples: