Extract text from ZIP archive files

To extract files from ZIP archives GetContainer method is used. This method returns the collection of ContainerItem objects.

Zip Entry can contain the following metadata:

dateThe time and date at which the file indicated by the Zip Entry was last modified.

These metadata refer to a container element itself, not a document.

Here are the steps to extract an email text from Zip archives:

  • Instantiate Parser object for the initial archive;
  • Call GetContainer method and obtain collection of ContainerItem objects;
  • Check if collection isn’t null (container extraction is supported for the document);
  • Iterate through the collection and get container item names, sizes and obtain content.

The following example shows how to extract a text from Zip archives:

// Create an instance of Parser class
using(Parser parser = new Parser(filePath))
    // Extract attachments from the container
    IEnumerable<ContainerItem> attachments = parser.GetContainer();

    // Iterate over files
    foreach(ContainerItem item in attachments)
		// Print the file path

        // Print metadata
        foreach(MetadataItem metadata in item.Metadata)
            Console.WriteLine(string.Format("{0}: {1}", metadata.Name, metadata.Value));
            // Create Parser object for the file content
            using (Parser fileParser = item.OpenParser())
                // Extract the file text
                using (TextReader reader = fileParser.GetText())
                    Console.WriteLine(reader == null ? "No text" : reader.ReadToEnd());
        catch (UnsupportedDocumentFormatException)
            Console.WriteLine("Isn't supported.");

