The following example shows how to extract HTML formatted text:

// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleDocx)) {
    // Extract a formatted text into the reader
    try (TextReader reader = parser.getFormattedText(new FormattedTextOptions(FormattedTextMode.Html))) {
        // Print a formatted text from the document
        // If formatted text extraction isn't supported, a reader is null
        System.out.println(reader == null ? "Formatted text extraction isn't suppported" : reader.readToEnd());
pParagraph is surrounded by p tag
bText with Bold font is surrounded by b tag
iText with Italic font is surrounded by i tag
h1 - h6If the heading has ‘Heading X’ style, it’s surrounded by <hX> tag
ol / ulNumbering and bullets lists

The following Microsoft Word document is used as input document:

The following HTML document is extracted using the example above:

