Detect encoding Leave feedback

GroupDocs.Parser provides the functionality to detect the encoding of a plain text file. The following encodings are supported:

UTF32 LE
UTF32 BE
UTF16 LE
UTF16 BE
UTF8
UTF7
ANSI

Encoding can be detected by BOM or by the content of the file (if BOM isn’t presented).

Here are the steps to detect the encoding of the document:

Instantiate LoadOptions object with the default ANSI encoding;
Instantiate Parser object for the initial document;
Call getDocumentInfo method and cast the result to TextDocumentInfo
Read the getCharset property.

The following example shows how to detect the encoding of the document:

// Create an instance of LoadOptions class with the default ANSI encoding.
// This encoding is returned for ANSI text documents.
LoadOptions loadOptions = new LoadOptions(FileFormat.WordProcessing, null, null, Charset.forName("US-ASCII"));
// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleText, loadOptions)) {
    // Get the document info
    IDocumentInfo info = parser.getDocumentInfo();
    // Check if it's the document info of a plain text document
    if (info instanceof TextDocumentInfo == false) {
        System.out.println("Isn't a plain text document");
        return;
    }
    // Print the encoding
    System.out.println("Encoding: " + ((TextDocumentInfo) info).getCharset().displayName());
}

More resources

GitHub examples

You may easily run the code above and see the feature in action in our GitHub examples:

Free online document parser App

Along with full featured Java library we provide simple, but powerful free Apps.

You are welcome to extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.

We value your opinion. Your feedback will help us improve our documentation.

Detect encoding Leave feedback

More resources

GitHub examples

Free online document parser App

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!