The code in below examples uses some methods defined in Common Utilities

This feature is supported by version 17.9.0 or greater.

Automatic Encoding Detection for Text Documents

GroupDocs.Search for .NET allows its users to automatically detect the encoding of each text file that is indexed. By default, AutoDetectEncoding property is set to false. The e-Search API detects the following encodings:

  • UTF32 LE
  • UTF32 BE
  • UTF16 LE
  • UTF16 BE
  • UTF8
  • UTF7
  • ANSI

Encoding can be detected by BOM or by the content of the file (if BOM is not presented). If encoding is not detected then UTF8 is used by default.

The Recipe

To detect the encoding of text files, the following steps are needed to be followed:

  • Create an index settings object
  • Enable automatic encoding detection using settings.AutoDetectEncoding = true
  • Create index by passing the index settings beside the index folder path
  • Add documents to index

The Code

The following snippet shows how to detect the encoding of each text file automatically.

The following code snippet shows how to detect encoding selectively for some text files.

