Container-aware parsing lets you open archive-like formats (ZIP, RAR, 7z, TAR), Outlook stores (PST/OST), PDF portfolios, and email attachments, then process their contents using the same parser API.
Iterate nested items safely
fromgroupdocs.parserimportParserfromgroupdocs.parser.exceptionsimportUnsupportedDocumentFormatExceptiondefprint_text(item_path,parser):reader=parser.get_text()print(f"{item_path}: {readerifreaderelse'[no text]'}")withParser("portfolio.pdf")asparser:container=parser.get_container()ifcontainerisNone:print("Container extraction isn't supported for this format.")else:foritemincontainer:try:withitem.open_parser()asinner_parser:print_text(item.file_path,inner_parser)exceptUnsupportedDocumentFormatException:print(f"{item.file_path}: format is not supported.")
The following sample file is used in this example: portfolio.pdf
Tips
Containers can be nested; call open_parser() recursively when needed.
Use item.file_type and item.metadata (when available) to decide how to process each attachment.
If get_container() returns None, the format does not expose attachments; continue with other extraction methods.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.