Extract Data from Attachments and ZIP Archives Leave feedback

Extract and parse attachments

Python

from groupdocs.parser import Parser
from groupdocs.parser.exceptions import UnsupportedDocumentFormatException

with Parser("./archive.zip") as parser:
    attachments = parser.get_container()
    if attachments is None:
        print("Container extraction isn't supported for this format.")
    else:
        for item in attachments:
            print(item.file_path)
            try:
                with item.open_parser() as attachment_parser:
                    reader = attachment_parser.get_text()
                    print(reader if reader else "No text available.")
            except Exception as e:
                print(e)

archive.zip

The following sample file is used in this example: archive.zip

Steps

Instantiate Parser for the container file.
Call get_container() to list ContainerItem entries.
For each item, use open_parser() to create a Parser over the embedded content and reuse standard extraction methods (get_text, get_images, get_metadata, etc.).
Handle unsupported formats with UnsupportedDocumentFormatException.

For working with attachments in complex scenarios (nested containers, load options), see the advanced containers guide.

id: extract-data-from-attachments-and-zip-archives url: parser/python-net/extract-data-from-attachments-and-zip-archives title: Extract Data from Attachments and ZIP Archives weight: 9 version: 25.12 description: “Work with containers such as ZIP archives, email stores, and PDF portfolios using GroupDocs.Parser for Python via .NET.” productName: GroupDocs.Parser for Python via .NET hideChildren: false toc: true tags: python, parser, attachments, zip, container, v25.12

Use container extraction to open archive-like formats (ZIP, RAR, TAR), Outlook stores (PST/OST), PDF portfolios, and email attachments, then parse their contents.

Extract and parse attachments

Python

from groupdocs.parser import Parser
from groupdocs.parser.exceptions import UnsupportedDocumentFormatException

with Parser("./archive.zip") as parser:
    attachments = parser.get_container()
    if attachments is None:
        print("Container extraction isn't supported for this format.")
    else:
        for item in attachments:
            print(item.file_path)
            try:
                with item.open_parser() as attachment_parser:
                    reader = attachment_parser.get_text()
                    print(reader if reader else "No text available.")
            except UnsupportedDocumentFormatException:
                print("Item format is not supported.")

archive.zip

The following sample file is used in this example: archive.zip

Steps

Instantiate Parser for the container file.
Call get_container() to list ContainerItem entries.
For each item, use open_parser() to create a Parser over the embedded content and reuse standard extraction methods (get_text, get_images, get_metadata, etc.).
Handle unsupported formats with UnsupportedDocumentFormatException.

For working with attachments in complex scenarios (nested containers, load options), follow the .NET advanced container guide; the same workflow applies in Python.

We value your opinion. Your feedback will help us improve our documentation.

Extract Data from Attachments and ZIP Archives Leave feedback

On this page

Extract and parse attachments

Steps

For working with attachments in complex scenarios (nested containers, load options), see the advanced containers guide.

Extract and parse attachments

Steps

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!

On this page