GroupDocs.Parser provides the functionality to extract a formatted text from documents by the getFormattedText(FormattedTextOptions) method:

TextReader getFormattedText(FormattedTextOptions options);

The method returns an instance of TextReader class with an extracted text. FormattedTextOptions has the following constructor:

FormattedTextOptions(FormattedTextMode mode);

FormattedTextMode enumeration has the following members:

HtmlHTML format.
MarkdownMarkdown format.
PlainTextPlain text format.

TextReader class extends java.io.Reader and adds the following members:

readLineReads a line of characters from the text reader and returns the data as a string.
readToEndReads all characters from the current position to the end of the text reader and returns them as one string.

Here are the steps to extract a HTML formatted text from the document:

The following example shows how to extract a document text as HTML text:

// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleDocx)) {
    // Extract a formatted text into the reader
    try (TextReader reader = parser.getFormattedText(new FormattedTextOptions(FormattedTextMode.Html))) {
        // Print a formatted text from the document
        // If formatted text extraction isn't supported, a reader is null
        System.out.println(reader == null ? "Formatted text extraction isn't suppported" : reader.readToEnd());

