Extract data from attachments and ZIP archives
Leave feedback
It is easy to extract data, text, images and use any GroupDocs.Parser feature for ZIP-archived documents. The same feature allows to get attachments from PDF and Emails and extract data from them.
Extract data from attachments and ZIP archives
To extract documents from ZIP files and get attachments from containers simply call the GetContainer method:
IEnumerable<ContainerItem>GetContainer()
This method returns a collection of ContainerItem objects:
Container represents both container-only files (like zip archives, outlook storage) and documents with attachments (like emails, PDF Portfolios).
Here are the steps to extract an email text from outlook storage:
Instantiate Parser object for the initial document;
Call GetContainer method and obtain collection of document container item objects;
Check if collection isn’t null (container extraction is supported for the document);
Iterate through the collection and obtain Parser object to extract a text.
The following example shows how to extract a text from from zip entities:
// Create an instance of Parser classusing(Parserparser=newParser(filePath)){// Extract attachments from the containerIEnumerable<ContainerItem>attachments=parser.GetContainer();// Check if container extraction is supportedif(attachments==null){Console.WriteLine("Container extraction isn't supported");}// Iterate over zip entitiesforeach(ContainerItemiteminattachments){// Print the file pathConsole.WriteLine(item.FilePath);try{// Create Parser object for the zip entity contentusing(ParserattachmentParser=item.OpenParser()){// Extract an zip entity textusing(TextReaderreader=attachmentParser.GetText()){Console.WriteLine(reader==null?"No text":reader.ReadToEnd());}}}catch(UnsupportedDocumentFormatException){Console.WriteLine("Isn't supported.");}}}
More resources
Advanced usage topics
To learn more how to work with attachments and ZIP archives, please refer the advanced help section.
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples:
Along with full featured .NET library we provide simple, but powerful free Apps.
You are welcome to extract text, metadata and images from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.