Text file encoding detection Leave feedback

To automatically detect encoding of a text file, the AutoDetectEncoding property defined in the IndexingOptions class can be used. Setting this property to true allows to detect the following encodings:

UTF-32 LE,
UTF-32 BE,
UTF-16 LE,
UTF-16 BE,
UTF-8,
UTF-7,

By default, the encoding auto detection of text files is disabled. But in any case, the encoding of a text file can be set during indexing when the FileIndexing event is raised. If the encoding of a text file has not been detected or specified in the event arguments, then the default encoding, UTF-8, is used. Available encodings are presented in the Encodings class. When the encoding of a text file is detected and used for indexing, it is saved in the index to use in such methods of Index class like Highlight and GetDocumentText.

The example below shows how to set encoding of a text during indexing.

string indexFolder = @"c:\MyIndex\";
string documentsFolder = @"c:\MyDocuments\";
 
// Creating an index
Index index = new Index(indexFolder);
 
// Subscribing to the event
index.Events.FileIndexing += (sender, args) =>
{
    if (args.DocumentFullPath.EndsWith(".txt", StringComparison.InvariantCultureIgnoreCase))
    {
        args.Encoding = Encodings.Windows_1253; // Setting encoding for each text file
    }
};
 
// Indexing documents from the specified folder
index.Add(documentsFolder);

External tools, such as Utf.Unknown, can be used to determine the encoding of a text file during indexing. Any encoding can be detected, including ANSI.

PM> NuGet\Install-Package UTF.Unknown

Below is an example of using the external library to determine the encoding of a text file.

index.Events.FileIndexing += (sender, args) =>
{
    byte[] data = File.ReadAllBytes(args.DocumentFullPath);
    UtfUnknown.DetectionResult result = UtfUnknown.CharsetDetector.DetectFromBytes(data);
    if (result.Detected != null)
    {
        Console.WriteLine("Encoding detected: " + result.Detected.EncodingName);
        args.Encoding = result.Detected.EncodingName;
    }
};

More resources

GitHub examples

You may easily run the code from documentation articles and see the features in action in our GitHub examples:

Free online document search App

Along with full featured .NET library we provide simple, but powerful free Apps.

You are welcome to search over your PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and more with our free online Free Online Document Search App.

We value your opinion. Your feedback will help us improve our documentation.

Text file encoding detection Leave feedback

More resources

GitHub examples

Free online document search App

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!