GroupDocs.Parser for Java 20.5 Release Notes

Major Features

There are the following improvements in this release:

  • Added RawPageCount property to IDocumentInfo interface
  • Implemented the ability to create Parser object with java.sql.Connection and EmailConnection objects

Full List of Issues Covering all Changes in this Release

KeySummaryCategory
PARSERNET-1507Add RawPageCount property to IDocumentInfo interfaceImprovement
PARSERNET-1364Implement the ability to create Parser object with DbConnectionNew feature
PARSERNET-1365Implement the ability to create Parser object with EmailConnectionNew feature

Public API and Backward Incompatible Changes

Add RawPageCount property to IDocumentInfo interface

Description

This feature improves API of raw text extraction from document page.

Public API changes

IDocumentInfo interface was updated with changes as follows:

  • Added RawPageCount property

Usage

The following example shows how to extract a raw text from a document page:

// Create an instance of Parser class
try (Parser parser = new Parser(filePath)) {
    // Check if the document supports text extraction
    if (!parser.getFeatures().isText()) {
        System.out.println("Document isn't supports text extraction.");
        return;
    }
    // Get the document info
    IDocumentInfo documentInfo = parser.getDocumentInfo();
    // Check if the document has pages
    if (documentInfo == null || documentInfo.getRawPageCount() == 0) {
        System.out.println("Document hasn't pages.");
        return;
    }
    // Iterate over pages
    for (int p = 0; p < documentInfo.getRawPageCount(); p++) {
        // Print a page number
        System.out.println(String.format("Page %d/%d", p + 1, documentInfo.getRawPageCount()));
        // Extract a text into the reader
        try (TextReader reader = parser.getText(p, new TextOptions(true))) {
            // Print a text from the document
            // We ignore null-checking as we have checked text extraction feature support earlier
            System.out.println(reader.readToEnd());
        }
    }
}

Implement the ability to create Parser object with DbConnection

Description

This feature allows to extract data from databases via JDBC.

Public API changes

Parser class was updated with changes as follows:

  • Added Parser(Connection) constructor
  • Added Parser(Connection, ParserSettings) constructor

Usage

The following example shows how to extract data from Sqlite database:

// Create DbConnection object
java.sql.Connection connection = java.sql.DriverManager.getConnection(String.format("jdbc:sqlite:%s", Constants.SampleDatabase));
// Create an instance of Parser class to extract tables from the database
try (Parser parser = new Parser(connection)) {
    // Check if text extraction is supported
    if (!parser.getFeatures().isText()) {
        System.out.println("Text extraction isn't supported.");
        return;
    }
    // Check if toc extraction is supported
    if (!parser.getFeatures().isToc()) {
        System.out.println("Toc extraction isn't supported.");
        return;
    }
    // Get a list of tables
    Iterable<TocItem> toc = parser.getToc();
    // Iterate over tables
    for(TocItem i : toc)
    {
        // Print the table name
        System.out.println(i.extractText());
        // Extract a table content as a text
        try(TextReader reader = parser.getText(i.getPageIndex().intValue()))
        {
            System.out.println(reader.readToEnd());
        }
    }
}

Implement the ability to create Parser object with EmailConnection

Description

This feature allows to extract data from email servers.

Public API changes

Parser class was updated with changes as follows:

  • Added Parser(EmailConnection) constructor
  • Added Parser(EmailConnection, ParserSettings) constructor

Usage

The following example shows how to extract emails from Exchange Server:

// Create the connection object for Exchange Web Services protocol
EmailConnection connection = new EmailEwsConnection(
        "https://outlook.office365.com/ews/exchange.asmx",
        "email@server",
        "password");
// Create an instance of Parser class to extract emails from the remote server
try (Parser parser = new Parser(connection)) {
    // Check if container extraction is supported
    if (!parser.getFeatures().isContainer()) {
        System.out.println("Container extraction isn't supported.");
        return;
    }
    // Extract email messages from the server
    Iterable<ContainerItem> emails = parser.getContainer();
    // Iterate over attachments
    for (ContainerItem item : emails) {
        // Create an instance of Parser class for email message
        try (Parser emailParser = item.openParser()) {
            // Extract the email text
            try (TextReader reader = emailParser.getText()) {
                // Print the email text
                System.out.println(reader == null ? "Text extraction isn't supported." : reader.readToEnd());
            }
        }
    }
}