Extract Data from Attachments and ZIP Archives
Leave feedback
On this page
Use container extraction to open archive-like formats (ZIP, RAR, TAR), Outlook stores (PST/OST), PDF portfolios, and email attachments, then parse their contents.
Extract and parse attachments
fromgroupdocs.parserimportParserfromgroupdocs.parser.exceptionsimportUnsupportedDocumentFormatExceptionwithParser("./archive.zip")asparser:attachments=parser.get_container()ifattachmentsisNone:print("Container extraction isn't supported for this format.")else:foriteminattachments:print(item.file_path)try:withitem.open_parser()asattachment_parser:reader=attachment_parser.get_text()print(readerifreaderelse"No text available.")exceptExceptionase:print(e)
The following sample file is used in this example: archive.zip
Steps
Instantiate Parser for the container file.
Call get_container() to list ContainerItem entries.
For each item, use open_parser() to create a Parser over the embedded content and reuse standard extraction methods (get_text, get_images, get_metadata, etc.).
Handle unsupported formats with UnsupportedDocumentFormatException.
For working with attachments in complex scenarios (nested containers, load options), see the advanced containers guide.
id: extract-data-from-attachments-and-zip-archives
url: parser/python-net/extract-data-from-attachments-and-zip-archives
title: Extract Data from Attachments and ZIP Archives
weight: 9
version: 25.12
description: “Work with containers such as ZIP archives, email stores, and PDF portfolios using GroupDocs.Parser for Python via .NET.”
productName: GroupDocs.Parser for Python via .NET
hideChildren: false
toc: true
tags: python, parser, attachments, zip, container, v25.12
Use container extraction to open archive-like formats (ZIP, RAR, TAR), Outlook stores (PST/OST), PDF portfolios, and email attachments, then parse their contents.
Extract and parse attachments
fromgroupdocs.parserimportParserfromgroupdocs.parser.exceptionsimportUnsupportedDocumentFormatExceptionwithParser("./archive.zip")asparser:attachments=parser.get_container()ifattachmentsisNone:print("Container extraction isn't supported for this format.")else:foriteminattachments:print(item.file_path)try:withitem.open_parser()asattachment_parser:reader=attachment_parser.get_text()print(readerifreaderelse"No text available.")exceptUnsupportedDocumentFormatException:print("Item format is not supported.")
The following sample file is used in this example: archive.zip
Steps
Instantiate Parser for the container file.
Call get_container() to list ContainerItem entries.
For each item, use open_parser() to create a Parser over the embedded content and reuse standard extraction methods (get_text, get_images, get_metadata, etc.).
Handle unsupported formats with UnsupportedDocumentFormatException.
For working with attachments in complex scenarios (nested containers, load options), follow the .NETadvanced container guide; the same workflow applies in Python.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.