GroupDocs.Parser for Java 18.9 Release Notes

Major Features

There are the following features in this release:

  • Ability to extract a text from databases
  • Ability to extract data from PDF Forms
  • Support for text analysis API for text documents
  • Support for text analysis API for spreadsheets
  • Support for text analysis API for presentation
  • Ability to request a password for protected documents
  • Metered licensing security is improved and now supports Java version 8u101 or above

Full List of Issues Covering all Changes in this Release

KeySummaryIssue Type
PARSERNET-555Implement the ability to extract a text from databasesNew feature
PARSERNET-975Implement the ability to extract data from the form fields of PDFsNew feature
PARSERNET-1024Implement the ability to request a password for protected documentsNew feature
PARSERNET-978Implement the support for text analysis API for text documentsNew feature
PARSERNET-979Implement the support for text analysis API for spreadsheetsNew feature
PARSERNET-980Implement the support for text analysis API for presentationsNew feature
PARSERJAVA-31Metered licensing security has been improved and now supports Java version 8u101 or aboveBreaking change

Public API and Backward Incompatible Changes

Breaking Change - Metered licensing security has been improved and now supports Java version 8u101 or above

Description

This breaking change improves Metered licensing security. Starting from GroupDocs.Parser 18.9 Metered API can be used only with Java 8u101 or above. Please use other types of licensing in Java 7.

Public API changes

No public API changes.

Usage

This code works only with Java 8u101 or above:

Metered metered = new Metered();
metered.setMeteredKey(PUBLIC_KEY, PRIVATE_KEY);

This code works with Java 7 or above:

License lic = new License();
lic.setLicense(LICENSE_PATH);

Ability to extract a text from databases

Description

This feature allows extracting a text from databases.

Public API changes

  • Added DbContainer class
  • Added DbTableReader class

Usage

To extract a text from databases DbContainer class is used. DbContainer class implements IContainer interface. Each data table is represented by the entity. The content of the entity is CSV-presentation of data table. For more detailed text extraction GetTableReader method is used. Also, this method is faster and consumes less memory. getTableReader method returns an instance of DbTableReader class.

DbTableReader class has the following members:

MemberDescription
read()Reads the next data row and returns a collection of row cells
readLine()Reads the next data row and returns a string representation of comma-separated values
columnsFilterGets or sets a collection of columns names which are returned byReadand ReadLine methods; null if all table columns are returned
columnsGets a collection of table columns names

Using DbContainer as a container:

// Create a container
DbContainer container = new DbContainer(java.sql.DriverManager.getConnection(connectionString));
try {
    // Iterate over entities
    for (Container.Entity entity : container.getEntities()) {
        // Print a table name
        System.out.println(entity.getName());
        // Print a media type
        System.out.println(entity.getMediaType());
        // Create a stream reader for CSV document: OpenStream method converts a table to the CSV file and returns it as Stream
        java.io.InputStreamReader reader = new java.io.InputStreamReader(entity.openStream());
        try {
            BufferedReader br = new BufferedReader(reader);
 
            // Read a line
            String line = br.readLine();
            // Loop to the end of the file
            while (line != null) {
                // Print a line from the document
                System.out.println(line);
                // Read the next line
                line = br.readLine();
            }
        } finally {
            reader.close();
        }
    }
} finally {
    container.dispose();
}

Ability to extract data from the form fields of PDFs

Description

This feature allows extracting data from PDF Forms.

Public API changes

  • Added getFormData method to PdfTextExtractor class

Usage

// Create a text extractor
PdfTextExtractor extractor = new PdfTextExtractor(fileName);
try {
    // Extract forms data
    java.util.Dictionary<String, String> fields = extractor.getFormData();
    // Iterate over fields
    java.util.Enumeration<String> e = fields.keys();
    while (e.hasMoreElements()) {
        String key = e.nextElement();
        // Print field name and value
        System.out.println(String.format("%s: %s", key, fields.get(key)));
    }
} finally {
    extractor.dispose();
}

Support for text analysis API for text documents

Description

This feature allows extracting text areas from document pages of text documents.

Public API changes

Added DocumentContent property to WordsTextExtractor class.

Usage

// Create a text extractor
WordsTextExtractor extractor = new WordsTextExtractor("invoice.docx");
 
// Create search options
TextAreaSearchOptions searchOptions = new TextAreaSearchOptions();
// Set a regular expression to search 'Invoice # XXX' text
searchOptions.setExpression("\\s?INVOICE\\s?#\\s?[0-9]+");
// Limit the search with a rectangle
searchOptions.setRectangle(new Rectangle(10, 10, 300, 150));
 
// Get text areas
java.util.List<TextArea> texts = extractor.getDocumentContent().getTextAreas(0, searchOptions);
 
// Iterate over a list
for (TextArea area : texts) {
    // Print a text
    System.out.println(area.getText());
}

Support for text analysis API for spreadsheets

Description

This feature allows extracting text areas from document pages of spreadsheets.

Public API changes

Added DocumentContent property to CellsTextExtractor class.

Usage

// Create a text extractor
CellsTextExtractor extractor = new CellsTextExtractor("invoice.xlsx");
 
// Create search options
TextAreaSearchOptions searchOptions = new TextAreaSearchOptions();
// Set a regular expression to search 'Invoice # XXX' text
searchOptions.setExpression("\\s?INVOICE\\s?#\\s?[0-9]+");
// Limit the search with a rectangle
searchOptions.setRectangle(new Rectangle(10, 10, 300, 150));
 
// Get text areas
java.util.List<TextArea> texts = extractor.getDocumentContent().getTextAreas(0, searchOptions);
 
// Iterate over a list
for (TextArea area : texts) {
    // Print a text
    System.out.println(area.getText());
}

Support for text analysis API for presentations

Description

This feature allows extracting text areas from document pages of presentations.

Public API changes

Added DocumentContent property to SlidesTextExtractor class.

Usage

// Create a text extractor
SlidesTextExtractor extractor = new SlidesTextExtractor("presentation.pptx");
 
// Create search options
TextAreaSearchOptions searchOptions = new TextAreaSearchOptions();
// Set a regular expression to search 'Published: XXXX.XX.XX' text
searchOptions.setExpression("\\s?Published\\:\\s?[0-9]{4}\\.[0-9]{2}\\.[0-9]{2}");
// Limit the search with a rectangle
searchOptions.setRectangle(new Rectangle(10, 10, 300, 150));
 
// Get text areas
java.util.List<TextArea> texts = extractor.getDocumentContent().getTextAreas(0, searchOptions);
 
// Iterate over a list
for (TextArea area : texts) {
    // Print a text
    System.out.println(area.getText());
}

Ability to request a password for protected documents

Description

This feature allows providing a password for protected documents on-demand.

Public API changes

Added IPasswordProvider interface.
Added PasswordRequest class.
Added PasswordProvider property to LoadOptions class.

Usage

IPasswordProvider interface has only one method:

void onPasswordRequest(object sender, PasswordRequest request);

This method is called when the extractor or container meets a password-protected document. sender contains the link to the caller. PasswordRequest class contains the information about the request:

MemberDescription
CancelThe boolean value indicating whether the request is rejected
PasswordA password for the document

A user has two ways to provide a password for the document. When the password is known, Password property of LoadOptions class is used. If it is not known whether it is protected or not before opening the document, PasswordProvider property of LoadOptions class is used.

class Indexer {
    // A name of the current processed file
    private String currentFileName;
 
    // Processes the directory
    public void process(java.io.File dir) {
        // Create load options with Password Provider
        LoadOptions loadOptions = new LoadOptions();
        loadOptions.setPasswordProvider(new PasswordProvider(this));
 
        // Process the sub-directories
        for (java.io.File file : dir.listFiles()) {
            if (file.isDirectory()) {
                // Process files in the directory
                process(file);
            } else {
                // Set the name of the current processed file
                currentFileName = file.getName();
 
                try {
                    // Extract a text from the file
                    String text = Extractor.DEFAULT.extractText(file.getPath(), loadOptions);
                    // Print the length of the file
                    System.out.println(String.format("%s, length: %d", currentFileName, (text == null ? "" : text).length()));
                } catch (GroupDocsParserException ex) {
                    // Print an error message (for example, "Invalid Password")
                    System.out.println(ex.getMessage());
                }
            }
        }
    }
 
    // Provides the ability to request a password from a user
    private class PasswordProvider implements IPasswordProvider {
        private final Indexer owner;
 
        public PasswordProvider(Indexer owner) {
            this.owner = owner;
        }
 
        // Requests a password from a user
        public void onPasswordRequest(Object sender, PasswordRequest request) {
            // Print a password request
            System.out.println("Enter password for" + owner.currentFileName + ":");
            java.util.Scanner scanner = new java.util.Scanner(System.in);
            String password = scanner.next();
 
            // If a user omits a password (entered a blank password)
            if ("" == password) {
                // Mark the request as cancelled
                request.setCancel(true);
            } else {
                // Set the password
                request.setPassword(password);
            }
        }
    }
}