To automatically detect the encoding of a text file, the setAutoDetectEncoding method defined in the IndexSettings class can be used. Passing the true value as an argument to this method allows to detect the following encodings:
UTF-32 LE,
UTF-32 BE,
UTF-16 LE,
UTF-16 BE,
UTF-8,
UTF-7,
ANSI.
By default, the encoding auto detection of text files is disabled. But in any case, the encoding of a text file can be set during indexing when the FileIndexing event is raised. If the encoding of a text file has not been detected or specified in the event arguments, then the default encoding, UTF-8, is used. Available encodings are presented in the Encodings class. When the encoding of a text file is detected and used for indexing, it is saved in the index to use in such methods of Index class like highlight and getDocumentText.
The example below shows how to set encoding of a text during indexing.
StringindexFolder="c:\\MyIndex\\";StringdocumentsFolder="c:\\MyDocuments\\";// Creating an index
Indexindex=newIndex(indexFolder);// Subscribing to the event
index.getEvents().FileIndexing.add(newEventHandler<FileIndexingEventArgs>(){publicvoidinvoke(Objectsender,FileIndexingEventArgsargs){if(args.getDocumentFullPath().endsWith(".txt")){args.setEncoding(Encodings.Windows_1253);// Setting encoding for each text file
}}});// Indexing documents from the specified folder
index.add(documentsFolder);
More resources
GitHub examples
You may easily run the code from documentation articles and see the features in action in our GitHub examples: