Extract images from Microsoft Office Word documents

To extract images from Microsoft Office Word documents getImages methods are used. By default images are extracted with its original format. With using ImageOptions class it is possible to extract images from Microsoft Office Word documents as bmp, gif, jpeg, png and webp formats.

Warning
getImages method returns null value if image extraction isn’t supported for the document. For example, image extraction isn’t supported for TXT files. Therefore, for TXT file getImages method returns null. If Microsoft Office Word document has no images, getImages method returns an empty collection.

Here are the steps to extract images from Microsoft Office Word document to PNG-files:

  • Instantiate Parser object for the initial document;
  • Call getImages method and obtain the collection of image objects;
  • Iterate through the collection and save image contents to the file.

The following example demonstrates how to extract images from Microsoft Office Word document:

// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleWithImagesDocx)) {
    // Extract images from document
    Iterable<PageImageArea> images = parser.getImages();
    // Create the options to save images in PNG format
    ImageOptions options = new ImageOptions(ImageFormat.Png);
    int imageNumber = 0;
    // Iterate over images
    for (PageImageArea image : images)
    {
        // Save the image to the png file
        image.save(Constants.getOutputFilePath(String.format("%d.png", imageNumber)), options);
        imageNumber++;
    }
}

More resources

GitHub examples

You may easily run the code above and see the feature in action in our GitHub examples:

Free online document parser App

Along with full featured .NET library we provide simple, but powerful free Apps.

You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.