Work With ZIP Archives

Work With ZIP Archives

Zip Entry can contain the following metadata:

NameDescription
dateThe time and date at which the file indicated by the Zip Entry was last modified.
crcThe 32-bit CRC (Cyclic Redundancy Check) on the contents of the Zip Entry.

Here are the steps to extract an email text from Zip archives:

  • Instantiate *Parser *object for the initial document;
  • Call getContainer method and obtain collection of document container item objects;
  • Check if collection isn’t null (container extraction is supported for the document);
  • Iterate through the collection and obtain Parser object to extract a text.

The following example shows how to extract a text from Zip archives:

// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleZip)) {
    // Extract attachments from the container
    Iterable<ContainerItem> attachments = parser.getContainer();
    // Check if container extraction is supported
    if (attachments == null) {
        System.out.println("Container extraction isn't supported");
    }
    // Iterate over zip entities
    for (ContainerItem item : attachments) {
        // Print the file path
        System.out.println(item.getFilePath());
        // Print metadata
        for (MetadataItem metadata : item.getMetadata()) {
            System.out.println(String.format("%s: %s", metadata.getName(), metadata.getValue()));
        }
        try {
            // Create Parser object for the zip entity content
            try (Parser attachmentParser = item.openParser()) {
                // Extract an zip entity text
                try (TextReader reader = attachmentParser.getText()) {
                    System.out.println(reader == null ? "No text" : reader.readToEnd());
                }
            }
        } catch (UnsupportedDocumentFormatException ex) {
            System.out.println("Isn't supported.");
        }
    }
}