Working with Containers

Container-aware parsing lets you open archive-like formats (ZIP, RAR, 7z, TAR), Outlook stores (PST/OST), PDF portfolios, and email attachments, then process their contents using the same parser API.

Iterate nested items safely

from groupdocs.parser import Parser
from groupdocs.parser.exceptions import UnsupportedDocumentFormatException

def print_text(item_path, parser):
    reader = parser.get_text()
    print(f"{item_path}: {reader if reader else '[no text]'}")

with Parser("portfolio.pdf") as parser:
    container = parser.get_container()
    if container is None:
        print("Container extraction isn't supported for this format.")
    else:
        for item in container:
            try:
                with item.open_parser() as inner_parser:
                    print_text(item.file_path, inner_parser)
            except UnsupportedDocumentFormatException:
                print(f"{item.file_path}: format is not supported.")

The following sample file is used in this example: portfolio.pdf

Tips

  • Containers can be nested; call open_parser() recursively when needed.
  • Use item.file_type and item.metadata (when available) to decide how to process each attachment.
  • If get_container() returns None, the format does not expose attachments; continue with other extraction methods.