Extract data from attachments and ZIP archives

It is easy to extract data, text, images and use any GroupDocs.Parser feature for ZIP-archived documents. The same feature allows to get attachments from  PDF and Emails and extract data from them.

To extract documents from ZIP files and get attachments from containers simply call the GetContainer method:

IEnumerable<ContainerItem> GetContainer()

This method returns a collection of ContainerItem objects:

NameThe name of the item.
DirectoryThe directory of the item.
FilePathThe full path of the item.
SizeThe size of the item in bytes.
MetadataThe collection of item metadata.
Stream OpenStream()Opens the stream of the item content.
Parser OpenParser()Creates the Parser object for the item content.
Parser OpenParser(LoadOptions)Creates the Parser object for the item content with LoadOptions.
Parser OpenParser(LoadOptions, ParserSettings)Creates the Parser object for the item content with LoadOptions and ParserSettings.

Container represents both container-only files (like zip archives, outlook storage) and documents with attachments (like emails, PDF Portfolios).

Here are the steps to extract an email text from outlook storage:

  • Instantiate Parser object for the initial document;
  • Call GetContainer method and obtain collection of document container item objects;
  • Check if collection isn’t null (container extraction is supported for the document);
  • Iterate through the collection and obtain Parser object to extract a text.

The following example shows how to extract a text from from zip entities:

// Create an instance of Parser class
using (Parser parser = new Parser(filePath))
    // Extract attachments from the container
    IEnumerable<ContainerItem> attachments = parser.GetContainer();
    // Check if container extraction is supported
    if (attachments == null)
        Console.WriteLine("Container extraction isn't supported");
    // Iterate over zip entities
    foreach (ContainerItem item in attachments)
        // Print the file path
            // Create Parser object for the zip entity content
            using (Parser attachmentParser = item.OpenParser())
                // Extract an zip entity text
                using (TextReader reader = attachmentParser.GetText())
                    Console.WriteLine(reader == null ? "No text" : reader.ReadToEnd());
        catch (UnsupportedDocumentFormatException)
            Console.WriteLine("Isn't supported.");

