Enabling language information
Documents of all WordProcessing formats can contain text in different languages. But, unlike the plain text documents (TXT), WordProcessing documents also contain a metadata about specific language (locale) of every piece of text. GroupDocs.Editor allows to extract and export this language information. For achieving this the WordProcessingEditOptions class contains the
EnableLanguageInformation public boolean property:
public boolean getEnableLanguageInformation() public void setEnableLanguageInformation(boolean)
By default its value is
false, which means that language metadata will not be extracted. But when this option is manually enabled, GroupDocs.Editor extracts locale info for every piece of textual content and preserves it in the EditableDocument instance, when document is edited. Finally, when user have obtained the EditableDocument instance and is generating the HTML markup for transferring it to the WYSIWYG HTML-editor in order to make document editable in the browser, this language information is represented as the ‘lang’ HTML attributes with appropriate values inside the SPAN HTML elements.
Enabling language information is useful when document contains different text parts in different languages; if document has text in some single language, this option has no many sense and thus is disabled by default.
However, when document is multi-language, enabling language information may be very suitable for two scenarios:
- It improves the quality of output WordProcessing document in roundtrip scenarios. When document with enabled
EnableLanguageInformationoption was converted to the EditableDocument instance, then HTML markup was generated, edited in the some HTML-editor, and then new instance of EditableDocument class was created from edited markup, language metadata in “lang” attributes is still preserved. When edited EditableDocument will be converted back to the output document of some WordProcessing format like DOCX or RTF, the textual content inside it will have connections to correct locale.