Extract images from documents Leave feedback

GroupDocs.Parser allows to extract images from PDF, Emails, Ebooks, Microsoft Office: Word (DOC, DOCX), PowerPoint (PPT, PPTX), Excel (XLS, XLSX), LibreOffice formats and many others (see full list at supported document formats article).

GroupDocs.Parser’s allows to easily implement simple and complex image extraction cases at the same time (see more at advanced help section).

In this article you can see how to extract images from any supported format without additional settings.

Extract images from documents

To extract images from documents simply call the getImages method:

Iterable<PageImageArea> getImages();

This method returns a collection of PageImageArea objects:

Member	Description
getPage	The page that contains the text area.
getRectangle	The rectangular area on the page that contains the text area.
getFileType	The format of the image.
getRotation	The rotation angle of the image.
getImageStream	Returns the image stream.
getImageStream(ImageOptions)	Returns the image stream in a different format.
save(String)	Saves the image to the file.
save(String, ImageOptions)	Saves the image to the file in a different format.

Here are the steps to extract images from the whole document:

Instantiate Parser object for the initial document;
Call getImages method and obtain collection of image objects;
Check if collection isn’t null (images extraction is supported for the document);
Iterate through the collection and get sizes, image types and image contents.

The following example shows how to extract all images from the whole document:

// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleImagesPdf)) {
    // Extract images
    Iterable<PageImageArea> images = parser.getImages();
    // Check if images extraction is supported
    if (images == null) {
        System.out.println("Images extraction isn't supported");
        return;
    }
    // Iterate over images
    for (PageImageArea image : images) {
        // Print a page index, rectangle and image type:
        System.out.println(String.format("Page: %d, R: %s, Type: %s", image.getPage().getIndex(), image.getRectangle(), image.getFileType()));
    }
}

More resources

Advanced usage topics

To learn more about document data extraction features and get familiar how to extract text, images, forms and more, please refer to the advanced usage section.

GitHub examples

You may easily run the code above and see the feature in action in our GitHub examples:

Free online document parser App

Along with full featured Java library we provide simple, but powerful free Apps.

You are welcome to extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.

We value your opinion. Your feedback will help us improve our documentation.

Extract images from documents Leave feedback

Extract images from documents

More resources

Advanced usage topics

GitHub examples

Free online document parser App

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!