Extract images from document page area

GroupDocs.Parser provides the functionality to extract images from document page area by the by the getImages(PageAreaOptions) and getImages(int, PageAreaOptions) methods:

Iterable<PageImageArea> getImages(PageAreaOptions options);
Iterable<PageImageArea> getImages(int pageIndex, PageAreaOptions options);

This method returns a collection of PageImageArea objects:

MemberDescription
getPageThe page that contains the text area.
getRectangleThe rectangular area on the page that contains the text area.
getFileTypeThe format of the image.
getRotationThe rotation angle of the image.
getImageStreamReturns the image stream.
getImageStream(ImageOptions)Returns the image stream in a different format.
save(String)Saves the image to the file.
save(String, ImageOptions)Saves the image to the file in a different format.

ImageOptions class is used to define the image format into which the image is converted. The following image formats are supported:

  • Bmp
  • Gif
  • Jpeg
  • Png
  • WebP

PageAreaOptions parameter is used to customize text areas extraction process. This class has the following members:

MemberDescription
getRectangleThe rectangular area that contains a text area.

Here are the steps to extract images from the upper-left corner:

  • Instantiate Parser object for the initial document;
  • Instantiate PageAreaOptions with the rectangular area;
  • Call getImages(PageAreaOptions) method and obtain collection of PageImageArea objects;
  • Check if collection isn’t null (images extraction is supported for the document);
  • Iterate through the collection and get sizes, image types and image contents.

The following example shows how to extract only images from the upper-left corner:

// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleImagesPdf)) {
    // Create the options which are used for images extraction
    PageAreaOptions options = new PageAreaOptions(new Rectangle(new Point(340, 150), new Size(300, 100)));
    // Extract images from the upper-left corner of a page:
    Iterable<PageImageArea> images = parser.getImages(options);
    // Check if images extraction is supported
    if (images == null) {
        System.out.println("Page images extraction isn't supported");
        return;
    }
    // Iterate over images
    for (PageImageArea image : images) {
        // Print a page index, rectangle and image type:
        System.out.println(String.format("Page: %d, R: %s, Type: %s", image.getPage().getIndex(), image.getRectangle(), image.getFileType()));
    }
}

More resources

GitHub examples

You may easily run the code above and see the feature in action in our GitHub examples:

Free online document parser App

Along with full featured Java library we provide simple, but powerful free Apps.

You are welcome to extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.