GroupDocs.Parser provides functionality to iterate through items in container documents such as ZIP archives, OST/PST files, and documents with attachments.
Prerequisites
GroupDocs.Parser for Python via .NET installed
Sample container files (ZIP, OST, PST, etc.)
Understanding of container/attachment concepts
Iterate through container items
To iterate through all items in a container:
fromgroupdocs.parserimportParser# Create an instance of Parser classwithParser("./archive.zip")asparser:# Get container itemsattachments=parser.get_container()# Check if container extraction is supportedifattachmentsisNone:print("Container extraction isn't supported")else:# Iterate over itemsforidx,attachmentinenumerate(attachments):print(f"Item{idx+1}:")print(f" Name: {attachment.name}")print(f" Size: {attachment.size} bytes")print(f" Path: {attachment.file_path}")
The following sample file is used in this example: archive.zip
Expected behavior: Iterates through all files in the container and displays their properties.
Count container items
To count files in a container:
fromgroupdocs.parserimportParserdefcount_container_items(file_path):"""
Count the number of items in a container.
"""withParser(file_path)asparser:attachments=parser.get_container()ifattachmentsisNone:return0# Convert to list to countitems=list(attachments)returnlen(items)# Usagecount=count_container_items("archive.zip")print(f"Container has {count} items")
The following sample file is used in this example: archive.zip
Expected behavior: Returns the total number of items in the container.
List container contents with details
To display detailed information about container contents:
fromgroupdocs.parserimportParserdeflist_container_contents(file_path):"""
List all container contents with detailed information.
"""withParser(file_path)asparser:attachments=parser.get_container()ifattachmentsisNone:print("Container extraction not supported")returnprint(f"{'#':<5}{'Name':<40}{'Size':<15}{'Path':<30}")print("-"*90)foridx,attachmentinenumerate(attachments,1):name=attachment.name[:38]iflen(attachment.name)>38elseattachment.namesize=f"{attachment.size:,} bytes"path=attachment.file_path[:28]ifattachment.file_pathandlen(attachment.file_path)>28else(attachment.file_pathor"")print(f"{idx:<5}{name:<40}{size:<15}{path:<30}")# Usagelist_container_contents("documents.zip")
The following sample file is used in this example: documents.zip
Expected behavior: Displays a formatted table of all container items with their properties.
Filter container items by extension
To filter and process only specific file types:
fromgroupdocs.parserimportParserimportosdeffilter_by_extension(file_path,extensions):"""
Filter container items by file extension.
Args:
file_path: Path to container
extensions: List of extensions to include (e.g., ['.pdf', '.docx'])
"""withParser(file_path)asparser:attachments=parser.get_container()ifattachmentsisNone:print("Container extraction not supported")return[]filtered_items=[]forattachmentinattachments:file_ext=os.path.splitext(attachment.name)[1].lower()iffile_extinextensions:filtered_items.append({'name':attachment.name,'size':attachment.size,'extension':file_ext})returnfiltered_items# Usage - find all PDF and Word documentsitems=filter_by_extension("archive.zip",['.pdf','.docx','.doc'])print(f"Found {len(items)} PDF/Word documents:")foriteminitems:print(f" {item['name']} ({item['size']:,} bytes)")
The following sample file is used in this example: archive.zip
Expected behavior: Returns only items matching the specified file extensions.
Calculate total container size
To calculate the total size of all items:
fromgroupdocs.parserimportParserdefcalculate_container_size(file_path):"""
Calculate total size of all items in container.
"""withParser(file_path)asparser:attachments=parser.get_container()ifattachmentsisNone:print("Container extraction not supported")return0total_size=0item_count=0forattachmentinattachments:total_size+=attachment.sizeitem_count+=1return{'total_size':total_size,'item_count':item_count,'average_size':total_size/item_countifitem_count>0else0}# Usagestats=calculate_container_size("archive.zip")print(f"Container Statistics:")print(f" Items: {stats['item_count']}")print(f" Total Size: {stats['total_size']:,} bytes ({stats['total_size']/1024/1024:.2f} MB)")print(f" Average Size: {stats['average_size']:,.0f} bytes")
The following sample file is used in this example: archive.zip
Expected behavior: Calculates and displays size statistics for the container.
Group items by file type
To categorize items by file type:
fromgroupdocs.parserimportParserfromcollectionsimportdefaultdictimportosdefgroup_by_file_type(file_path):"""
Group container items by file extension.
"""withParser(file_path)asparser:attachments=parser.get_container()ifattachmentsisNone:print("Container extraction not supported")return{}groups=defaultdict(list)forattachmentinattachments:ext=os.path.splitext(attachment.name)[1].lower()or'no_extension'groups[ext].append({'name':attachment.name,'size':attachment.size})returndict(groups)# Usagegroups=group_by_file_type("mixed_archive.zip")print("Files grouped by type:\n")forext,itemsinsorted(groups.items()):total_size=sum(item['size']foriteminitems)print(f"{ext.upper()} files: {len(items)} items, {total_size:,} bytes")foriteminitems[:3]:# Show first 3print(f" - {item['name']}")iflen(items)>3:print(f" ... and {len(items)-3} more")print()
The following sample file is used in this example: mixed_archive.zip
Expected behavior: Organizes and displays items grouped by file extension.
Search for specific files
To search for files by name pattern:
fromgroupdocs.parserimportParserimportredefsearch_container_items(file_path,pattern):"""
Search for items matching a name pattern.
Args:
file_path: Path to container
pattern: Regex pattern to match file names
"""withParser(file_path)asparser:attachments=parser.get_container()ifattachmentsisNone:print("Container extraction not supported")return[]regex=re.compile(pattern,re.IGNORECASE)matches=[]forattachmentinattachments:ifregex.search(attachment.name):matches.append({'name':attachment.name,'size':attachment.size,'path':attachment.file_path})returnmatches# Usage - find all files containing "invoice"matches=search_container_items("documents.zip",r'invoice')print(f"Found {len(matches)} matching files:")formatchinmatches:print(f" {match['name']} ({match['size']:,} bytes)")
The following sample file is used in this example: documents.zip
Expected behavior: Returns items whose names match the search pattern.
Process items with progress tracking
To iterate with progress indication:
fromgroupdocs.parserimportParserdefprocess_with_progress(file_path):"""
Process container items with progress tracking.
"""withParser(file_path)asparser:attachments=parser.get_container()ifattachmentsisNone:print("Container extraction not supported")return# Convert to list to get total countitems=list(attachments)total=len(items)print(f"Processing {total} items...")foridx,attachmentinenumerate(items,1):# Show progressprogress=(idx/total)*100print(f"[{progress:5.1f}%] Processing: {attachment.name}")# Process item (example: just count size)# In real scenario, you might extract text, etc.# Simulate processing# time.sleep(0.1)print("\nProcessing complete!")# Usageprocess_with_progress("archive.zip")
The following sample file is used in this example: archive.zip
Expected behavior: Processes items while showing progress percentage.
Export container inventory to CSV
To create a CSV inventory of container contents:
fromgroupdocs.parserimportParserimportcsvdefexport_inventory_to_csv(file_path,output_csv):"""
Export container inventory to CSV file.
"""withParser(file_path)asparser:attachments=parser.get_container()ifattachmentsisNone:print("Container extraction not supported")returnFalse# Write to CSVwithopen(output_csv,'w',newline='',encoding='utf-8')ascsvfile:fieldnames=['Index','Name','Size (bytes)','Path']writer=csv.DictWriter(csvfile,fieldnames=fieldnames)writer.writeheader()foridx,attachmentinenumerate(attachments,1):writer.writerow({'Index':idx,'Name':attachment.name,'Size (bytes)':attachment.size,'Path':attachment.file_pathor''})print(f"Inventory exported to {output_csv}")returnTrue# Usageexport_inventory_to_csv("archive.zip","inventory.csv")
The following sample file is used in this example: archive.zip
The following sample file is used in this example: inventory.csv
Expected behavior: Creates a CSV file listing all container items and their properties.
Notes
The get_container() method returns None if container extraction is not supported
Container items have properties: name, size, and file_path
Use open_parser() on items to create a parser for individual files
Container iteration is lightweight and doesn’t extract file contents
Supports ZIP, OST, PST, and documents with attachments
Item names may include relative paths for nested structures