GroupDocs.Parser provides functionality to save extracted images directly to files with support for format conversion.
Prerequisites
GroupDocs.Parser for Python via .NET installed
Sample documents with images
Write permissions to output directory
Save images to files
To extract and save images to files:
fromgroupdocs.parserimportParserimportos# Create output directoryoutput_dir="extracted_images"os.makedirs(output_dir,exist_ok=True)# Create an instance of Parser classwithParser("./document.pdf")asparser:# Extract imagesimages=parser.get_images()# Check if image extraction is supportedifimagesisNone:print("Image extraction isn't supported")else:# Iterate over images and save themforidx,imageinenumerate(images):# Generate filename with appropriate extensionfilename=f"image_{idx+1}{image.file_type.extension}"filepath=os.path.join(output_dir,filename)# Save image to fileimage.save(filepath)print(f"Saved: {filename}")
The following sample file is used in this example: document.pdf
Expected behavior: Saves each extracted image to a separate file using the image’s original format.
Save images with format conversion
To save images in a specific format (e.g., PNG):
fromgroupdocs.parserimportParserfromgroupdocs.parser.optionsimportImageOptions,ImageFormatimportos# Create output directoryoutput_dir="images_png"os.makedirs(output_dir,exist_ok=True)# Create an instance of Parser classwithParser("presentation.pptx")asparser:# Extract imagesimages=parser.get_images()ifimagesisNone:print("Image extraction not supported")else:# Create image options for PNG formatpng_options=ImageOptions(ImageFormat.PNG)# Save all images as PNGforidx,imageinenumerate(images):filename=f"image_{idx+1}.png"filepath=os.path.join(output_dir,filename)# Save with format conversionimage.save(filepath,png_options)print(f"Saved as PNG: {filename}")
The following sample file is used in this example: presentation.pptx
The following sample file is used in this example: ImageFormat.PNG
Expected behavior: Converts all extracted images to PNG format before saving.
Save images with custom naming
To save images with descriptive filenames:
fromgroupdocs.parserimportParserimportosfromdatetimeimportdatetimedefsave_images_with_custom_names(file_path,output_dir,prefix="doc"):"""
Save images with custom naming pattern.
"""os.makedirs(output_dir,exist_ok=True)# Generate timestamptimestamp=datetime.now().strftime("%Y%m%d_%H%M%S")withParser(file_path)asparser:images=parser.get_images()ifimagesisNone:print("Image extraction not supported")return0saved_count=0foridx,imageinenumerate(images):# Create custom filenamefilename=f"{prefix}_{timestamp}_page{image.page.index+1}_img{idx+1}{image.file_type.extension}"filepath=os.path.join(output_dir,filename)# Save imageimage.save(filepath)print(f"Saved: {filename}")saved_count+=1returnsaved_count# Usagecount=save_images_with_custom_names("report.pdf","saved_images",prefix="report")print(f"Totalimagessaved:{count}")
Expected behavior: Saves images with descriptive filenames including timestamp, page number, and index.
Save images in multiple formats
To save each image in multiple formats:
fromgroupdocs.parserimportParserfromgroupdocs.parser.optionsimportImageOptions,ImageFormatimportos# Create output directoryoutput_dir="images_multi_format"os.makedirs(output_dir,exist_ok=True)# Create an instance of Parser classwithParser("document.docx")asparser:images=parser.get_images()ifimages:# Define formats to saveformats={'png':ImageOptions(ImageFormat.PNG),'jpg':ImageOptions(ImageFormat.JPEG),'bmp':ImageOptions(ImageFormat.BMP)}foridx,imageinenumerate(images):print(f"Processingimage{idx+1}:")# Save in each formatforformat_name,optionsinformats.items():filename=f"image_{idx+1}.{format_name}"filepath=os.path.join(output_dir,filename)image.save(filepath,options)print(f" Saved as {format_name.upper()}")
The following sample file is used in this example: document.docx
The following sample file is used in this example: ImageFormat.PNG
The following sample file is used in this example: ImageFormat.JPEG
The following sample file is used in this example: ImageFormat.BMP
Expected behavior: Saves each image in PNG, JPEG, and BMP formats.
Organize images by page
To save images organized by page number:
fromgroupdocs.parserimportParserimportosdefsave_images_by_page(file_path,output_dir):"""
Save images organized in subdirectories by page.
"""os.makedirs(output_dir,exist_ok=True)withParser(file_path)asparser:images=parser.get_images()ifimagesisNone:print("Image extraction not supported")return# Group images by pageimages_by_page={}forimageinimages:page_num=image.page.index+1ifpage_numnotinimages_by_page:images_by_page[page_num]=[]images_by_page[page_num].append(image)# Save images organized by pagetotal_saved=0forpage_num,page_imagesinsorted(images_by_page.items()):# Create page directorypage_dir=os.path.join(output_dir,f"page_{page_num}")os.makedirs(page_dir,exist_ok=True)print(f"Page{page_num}:{len(page_images)}images")forimg_idx,imageinenumerate(page_images):filename=f"image_{img_idx+1}{image.file_type.extension}"filepath=os.path.join(page_dir,filename)image.save(filepath)print(f" Saved: {filename}")total_saved+=1print(f"Totalimagessaved:{total_saved}")# Usagesave_images_by_page("multi_page.pdf","images_by_page")
Expected behavior: Creates subdirectories for each page and saves images within their respective page folders.
Save only images of specific type
To save only images of a particular format:
fromgroupdocs.parserimportParserimportosdefsave_images_by_type(file_path,output_dir,target_extensions):"""
Save only images of specific types.
Args:
file_path: Path to document
output_dir: Output directory
target_extensions: List of extensions to save (e.g., ['.jpg', '.png'])
"""os.makedirs(output_dir,exist_ok=True)withParser(file_path)asparser:images=parser.get_images()ifimagesisNone:print("Image extraction not supported")returnsaved_count=0skipped_count=0foridx,imageinenumerate(images):ext=image.file_type.extension.lower()ifextintarget_extensions:filename=f"image_{idx+1}{ext}"filepath=os.path.join(output_dir,filename)image.save(filepath)print(f"Saved: {filename}")saved_count+=1else:print(f"Skipped: image_{idx+1}{ext}")skipped_count+=1print(f"Saved:{saved_count},Skipped:{skipped_count}")# Usage - save only JPG and PNG imagessave_images_by_type("document.pdf","filtered_images",['.jpg','.jpeg','.png'])
Expected behavior: Saves only images matching the specified formats, skipping others.
Batch save images from multiple documents
To extract and save images from multiple documents:
fromgroupdocs.parserimportParserfrompathlibimportPathimportosdefbatch_save_images(input_dir,output_dir):"""
Extract and save images from all documents in a directory.
"""os.makedirs(output_dir,exist_ok=True)extensions=['.pdf','.docx','.doc','.pptx','.ppt']forfile_pathinPath(input_dir).rglob('*'):iffile_path.suffix.lower()inextensions:print(f"Processing:{file_path.name}")# Create subdirectory for this documentdoc_output_dir=os.path.join(output_dir,file_path.stem)os.makedirs(doc_output_dir,exist_ok=True)try:withParser(str(file_path))asparser:images=parser.get_images()ifimages:image_count=0foridx,imageinenumerate(images):filename=f"image_{idx+1}{image.file_type.extension}"filepath=os.path.join(doc_output_dir,filename)image.save(filepath)image_count+=1print(f" Saved {image_count} images")else:print(f" No images or not supported")exceptExceptionase:print(f" Error: {e}")# Usagebatch_save_images("input_documents","all_extracted_images")
Expected behavior: Processes multiple documents and saves their images in organized subdirectories.
Notes
The save() method saves images to the specified file path
Use ImageOptions to convert images to different formats