Detect encoding

GroupDocs.Parser provides the functionality to detect the encoding of a plain text file. The following encodings are supported:

  • UTF32 LE
  • UTF32 BE
  • UTF16 LE
  • UTF16 BE
  • UTF8
  • UTF7
  • ANSI

Encoding can be detected by BOM or by the content of the file (if BOM isn’t presented).

Here are the steps to detect the encoding of the document:

The following example shows how to detect the encoding of the document:

// Create an instance of LoadOptions class with the default ANSI encoding.
// This encoding is returned for ANSI text documents.
LoadOptions loadOptions = new LoadOptions(FileFormat.WordProcessing, null, null, Charset.forName("US-ASCII"));
// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleText, loadOptions)) {
    // Get the document info
    IDocumentInfo info = parser.getDocumentInfo();
    // Check if it's the document info of a plain text document
    if (info instanceof TextDocumentInfo == false) {
        System.out.println("Isn't a plain text document");
        return;
    }
    // Print the encoding
    System.out.println("Encoding: " + ((TextDocumentInfo) info).getCharset().displayName());
}

More resources

GitHub examples

You may easily run the code above and see the feature in action in our GitHub examples:

Free online document parser App

Along with full featured Java library we provide simple, but powerful free Apps.

You are welcome to extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.