Extract images from document page Leave feedback

GroupDocs.Parser provides the functionality to extract images from document page by the getImages(int) method:

Iterable<PageImageArea> getImages(int pageIndex);

This method returns a collection of PageImageArea objects:

Member	Description
getPage	The page that contains the text area.
getRectangle	The rectangular area on the page that contains the text area.
getFileType	The format of the image.
getRotation	The rotation angle of the image.
getImageStream	Returns the image stream.
getImageStream(ImageOptions)	Returns the image stream in a different format.
save(String)	Saves the image to the file.
save(String, ImageOptions)	Saves the image to the file in a different format.

ImageOptions class is used to define the image format into which the image is converted. The following image formats are supported:

Bmp
Gif
Jpeg
Png
WebP

Here are the steps to extract images from the document page:

Instantiate Parser object for the initial document;
Call isImages property to check if images extraction is supported for the document;
Call getImages(int) method with the page index and obtain collection of PageImageArea objects;
Iterate through the collection and get sizes, image types and image contents.

The following example shows how to extract images from a document page:

// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleImagesPdf)) {
    // Check if the document supports images extraction
    if (!parser.getFeatures().isImages()) {
        System.out.println("Document isn't supports images extraction.");
        return;
    }
    // Get the document info
    IDocumentInfo documentInfo = parser.getDocumentInfo();
    // Check if the document has pages
    if (documentInfo.getPageCount() == 0) {
        System.out.println("Document hasn't pages.");
        return;
    }
    // Iterate over pages
    for (int pageIndex = 0; pageIndex < documentInfo.getPageCount(); pageIndex++) {
        // Print a page number
        System.out.println(String.format("Page %d/%d", pageIndex + 1, documentInfo.getPageCount()));
        // Iterate over images
        // We ignore null-checking as we have checked images extraction feature support earlier
        for (PageImageArea image : parser.getImages(pageIndex)) {
            // Print a rectangle and image type
            System.out.println(String.format("R: %s, Text: %s", image.getRectangle(), image.getFileType()));
        }
    }
}

More resources

GitHub examples

You may easily run the code above and see the feature in action in our GitHub examples:

Free online document parser App

Along with full featured Java library we provide simple, but powerful free Apps.

You are welcome to extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.

We value your opinion. Your feedback will help us improve our documentation.

Extract images from document page Leave feedback

More resources

GitHub examples

Free online document parser App

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!