GroupDocs.Parser for .NET 17.10 Release Notes

Major Features

There are the following features and enhancements in this release:

  • Remove obsolete members (v1703)
  • Implement additional properties for container entities
  • Update parameters keys for PersonalStorageContainer
  • Improve the performance of PDF text extractor
  • Implement the ability to extract a text from POP3 and IMAP mail servers

All Changes

KeySummaryIssue Type
TEXTNET-578Remove obsolete members (v1703)Enhancement
TEXTNET-753Implement additional properties for container entitiesEnhancement
TEXTNET-755Update parameters keys for PersonalStorageContainerEnhancement
TEXTNET-775Improve the performance of PDF text extractorEnhancement
TEXTNET-550Implement the ability to extract a text from POP3 and IMAP mail serversNew Feature

Public API and Backward Incompatible Changes

Remove obsolete members (v1703)

Description

ExtractMetadata obsolete methods from ExtractorFactory class were removed.

Public API Changes

ExtractMetadata obsolete methods from ExtractorFactory class were removed.

Usage

Use **ExtractMetadata **methods of **Extractor **class instead of ExtractorFactory class.

C#

// use
var extractor = new Extractor();
var metadata = extractor.ExtractMetadata(("document.pdf", loadOptions);
// instead of
var factory = new ExtractorFactory();
var metadata = factory.ExtractMetadata("document.pdf", loadOptions);

Implement additional properties for container entities

Description

Implemented the ability to retrieve date and size properties of containers entity.

Public API changes

Added Date and Size properties to Container.Entity class.

Added constructor with date and size parameters to Container.Entity class.

Usage

Container.Entity class contains the following properties:

Property

Data type

Description

Name

String

Name of the entity. Depending on container it contains a filename, a unified id, a sequence number etc.

Path

ContainerPath

Instance of ContainerPath class that represents the path of entity in the container

MediaType

String

Media type of the entity (or null if a media type isn't set)

Date

DateTime?

Date of the entity (or null if a date isn't set). In most cases, it means "last modified"

Size

Int64

Size (in bytes) of the entity (or 0 if a size isn't set)

The following containers support the extraction of date and size properties of an entity:

  • EmailContainer
  • ZipContainer
  • PersonalStorageContainer 

C#

// Create a container for zip-file
using (var c = new ZipContainer("data.zip")) {
    // Iterate via entities
    foreach (var e in c.Entities) {
        Console.WriteLine("Name: " + e.Name); // name of the file (i.e. "document.pdf")
        Console.WriteLine("Path: " + e.Path.ToString()); // path of the file in the container (i.e. "/contracts")
        Console.WriteLine("Date: " + e.Date.ToString()); // date when the file was added to the archive
        Console.WriteLine("Size: " + e.Size.ToString()); // uncompressed size of the file
    }
}
Entities from ZipContainer also have CRC property:
Console.WriteLine("CRC: " + e[MetadataNames.Crc]);

Update parameters keys for PersonalStorageContainer

Description

Entities of PersonalStorageContainer class use MetadataNames class constants.

Public API changes

EmailSubjectEmailSender and EmailReceiver constants of PersonalStorageContainer class were marked as obsolete.

Usage

Use **MetadataNames.* **constants instead:

C#

// Use:
Console.WriteLine("Subject: ",
container.Entities[0][MetadataNames.Subject]);
Console.WriteLine("From: ",
container.Entities[0][MetadataNames.EmailFrom]);
Console.WriteLine("To: ",
container.Entities[0][MetadataNames.EmailTo]);
// Instead of:
Console.WriteLine("Subject: ",
container.Entities[0][PersonalStorageContainer.EmailSubject]);
Console.WriteLine("From: ",
container.Entities[0][PersonalStorageContainer.EmailSender]);
Console.WriteLine("To: ",
container.Entities[0][PersonalStorageContainer.EmailReceiver]);

Improve the performance of PDF text extractor

Description

The performance of PDF text extractor was improved.

Public API changes

Public API was not changed.

Usage

C#

// Create an instance of PDF text extractor
using (var extractor = new
PdfTextExtractor(stream)) {
//Set extraction mode to Fast text extraction
extractor.ExtractMode = ExtractMode.Simple; 
//Extract a text from the document
Console.WriteLine(extractor.ExtractAll());
}

Implement the ability to extract a text from POP3 and IMAP mail servers

Description

This feature allows extracting emails from email servers using POP and IMAP protocols.

Public API changes

Added Host and Port properties to EmailConnectionInfo class.

Added CreatePopConnectionInfo and CreateImapConnectionInfo static methods to EmailConnectionInfo class.

Added Pop and Imap members to EmailConnectionType enumeration.

Usage

To retrieve a list of all emails **Entities **property is used:

C#

// Create connection info
var info = EmailConnectionInfo.CreatePopConnectionInfo(@"pop-mail.outlook.com", 995, "username", "password");
// Create an email container
using (var container = new EmailContainer(info)) {
    // Iterate over emails
    foreach(var entity in container.Entities) {
        Console.WriteLine("Folder: " + entity.Path.ToString()); // A folder at server
        Console.WriteLine("Subject: " + entity[MetadataNames.Subject]); // A subject of email
        Console.WriteLine("From: " + entity[MetadataNames.EmailFrom]); // "From" address
        Console.WriteLine("To: " + entity[MetadataNames.EmailTo]); // "To" addresses
    }
}

To retrieve an email **OpenEntityStream **method is used:

C#

// Create connection info
var info = EmailConnectionInfo.CreatePopConnectionInfo(@"pop-mail.outlook.com", 995, "username", "password");
// Create an email container
using (var container = new EmailContainer(info)) {
    // Iterate over emails
    foreach(var entity in container.Entities) {
        // Create a stream with content of email
        var stream = container.OpenEntityStream(entity); // or var stream = entity.OpenStream();
        // Create a text extractor for email
        using(var extractor = new EmailTextExtractor(stream)) {
            // Extract all the text from email
            Console.WriteLine(extractor.ExtractAll());
        }
    }
}