Automatic Encoding Detection for Text Documents
GroupDocs.Search for .NET allows its users to automatically detect the encoding of each text file that is indexed. By default, AutoDetectEncoding property is set to false. The e-Search API detects the following encodings:
- UTF32 LE
- UTF32 BE
- UTF16 LE
- UTF16 BE
Encoding can be detected by BOM or by the content of the file (if BOM is not presented). If encoding is not detected then UTF8 is used by default.
To detect the encoding of text files, the following steps are needed to be followed:
- Create an index settings object
- Enable automatic encoding detection using settings.AutoDetectEncoding = true
- Create index by passing the index settings beside the index folder path
- Add documents to index
The following snippet shows how to detect the encoding of each text file automatically.
The following code snippet shows how to detect encoding selectively for some text files.