Detect encoding
GroupDocs.Parser provides the functionality to detect the encoding of a plain text file. The following encodings are supported:
- UTF32 LE
- UTF32 BE
- UTF16 LE
- UTF16 BE
- UTF8
- UTF7
- ANSI
Encoding can be detected by BOM or by the content of the file (if BOM isn’t presented).
Here are the steps to detect the encoding of the document:
- Instantiate LoadOptions object with the default ANSI encoding;
- Instantiate Parser object for the initial document;
- Call GetDocumentInfo method and cast the result to TextDocumentInfo;
- Read the Encoding property.
The following example shows how to detect the encoding of the document:
// Create an instance of LoadOptions class with the default ANSI encoding.
// This encoding is returned for ANSI text documents.
LoadOptions loadOptions = new LoadOptions(FileFormat.WordProcessing, null, null, Encoding.GetEncoding(1251));
// Create an instance of Parser class
using (Parser parser = new Parser(Constants.SampleText, loadOptions))
{
// Get the document info
TextDocumentInfo info = parser.GetDocumentInfo() as TextDocumentInfo;
// Check if it's the document info of a plain text document
if (info == null)
{
Console.WriteLine("Isn't a plain text document");
return;
}
// Print the encoding
Console.WriteLine("Encoding: " + info.Encoding.WebName);
}
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples:
Free online document parser App
Along with full featured .NET library we provide simple, but powerful free Apps.
You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.