Enabling language information

Documents of all WordProcessing formats can contain text in different languages. But, unlike the plain text documents (TXT), WordProcessing documents also contain a metadata about specific language (locale) of every piece of text. GroupDocs.Editor allows to extract and export this language information. For achieving this the WordProcessingEditOptions class contains the EnableLanguageInformation public boolean property:

public boolean getEnableLanguageInformation()
public void setEnableLanguageInformation(boolean)

By default its value is false, which means that language metadata will not be extracted. But when this option is manually enabled, GroupDocs.Editor extracts locale info for every piece of textual content and preserves it in the EditableDocument instance, when document is edited. Finally, when user have obtained the EditableDocument instance and is generating the HTML markup for transferring it to the WYSIWYG HTML-editor in order to make document editable in the browser, this language information is represented as the ’lang’ HTML attributes with appropriate values inside the SPAN HTML elements.

Enabling language information is useful when document contains different text parts in different languages; if document has text in some single language, this option has no many sense and thus is disabled by default.

However, when document is multi-language, enabling language information may be very suitable for two scenarios:

  • It eases spell checking for client-base JavaScript spell-checkers, that are working in the browser. However, this is very dependent on specific spell-checker, as not all spell-checkers are able to grab values from “lang” attributes or even use language information at all.
  • It improves the quality of output WordProcessing document in roundtrip scenarios. When document with enabled getEnableLanguageInformation() option was converted to the EditableDocument instance, then HTML markup was generated, edited in the some HTML-editor, and then new instance of EditableDocument class was created from edited markup, language metadata in “lang” attributes is still preserved. When edited EditableDocument will be converted back to the output document of some WordProcessing format like DOCX or RTF, the textual content inside it will have connections to correct locale.