To extract images from PDF documents getImages methods are used. By default images are extracted with its original format. With using ImageOptions class it is possible to extract images from PDF documents as bmp, gif, jpeg, png and webp formats.
Warning
getImages method returns null value if image extraction isn’t supported for the document. For example, image extraction isn’t supported for TXT files. Therefore, for TXT file getImages method returns null. If PDF document has no images, getImages method returns an empty collection.
Here are the steps to extract images from PDF document to PNG-files:
Instantiate Parser object for the initial document;
Call getImages method and obtain the collection of image objects;
Iterate through the collection and save image contents to the file.
The following example demonstrates how to extract images from PDF document:
// Create an instance of Parser class
try(Parserparser=newParser(Constants.SampleImagesPdf)){// Extract images from document
Iterable<PageImageArea>images=parser.getImages();// Create the options to save images in PNG format
ImageOptionsoptions=newImageOptions(ImageFormat.Png);intimageNumber=0;// Iterate over images
for(PageImageAreaimage:images){// Save the image to the png file
image.save(Constants.getOutputFilePath(String.format("%d.png",imageNumber)),options);imageNumber++;}}
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples:
Along with full featured .NET library we provide simple, but powerful free Apps.
You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.