GroupDocs.Parser for .NET 18.9 Release Notes
This page contains release notes for GroupDocs.Parser for .NET 18.9.
Major Features
There are the following features in this release:
- Ability to extract a text from databases
- Ability to extract data from PDF Forms
Full List of Issues Covering all Changes in this Release
Key | Summary | Issue Type |
---|---|---|
PARSERNET-555 | Implement the ability to extract a text from databases | New feature |
PARSERNET-975 | Implement the ability to extract data from the form fields of PDFs | New feature |
Public API and Backward Incompatible Changes
This section lists public API changes that were introduced in GroupDocs.Parser for .NET 18.9. It includes not only new and obsoleted public methods, but also a description of any changes in the behavior behind the scenes in GroupDocs.Parser which may affect existing code. Any behavior introduced that could be seen as a regression and modifies existing behavior is especially important and is documented here.
Ability to extract a text from databases
Description
This feature allows extracting a text from databases.
Public API changes
- Added DbContainer class
- Added DbTableReader class
Usage
To extract a text from databases DbContainer class is used. DbContainer class implements IContainer interface. Each data table is represented by the entity. The content of the entity is CSV-presentation of data table. For more detailed text extraction GetTableReader method is used. Also, this method is faster and consumes less memory. GetTableReader method returns an instance of DbTableReader class.
DbTableReader class has the following members:
Member | Description |
---|---|
Read() | Reads the next data row and returns a collection of row cells |
ReadLine() | Reads the next data row and returns a string representation of comma-separated values |
ColumnsFilter | Gets or sets a collection of columns names which are returned by Read and ReadLine methods; null if all table columns are returned |
Columns | Gets a collection of table columns names |
Using DbContainer as a container:
C#
// Create a container
using (var container = new DbContainer(new SQLiteConnection(connectionString)))
{
// Iterate over entities
foreach (var entity in container.Entities)
{
// Print a table name
System.Console.WriteLine(entity.Name);
// Print a media type
System.Console.WriteLine(entity.MediaType);
// Create a stream reader for CSV document: OpenStream method converts a table to the CSV file and returns it as Stream
using (var reader = new StreamReader(entity.OpenStream()))
{
// Read a line
string line = reader.ReadLine();
// Loop to the end of the file
while (line != null)
{
// Print a line from the document
System.Console.WriteLine(line);
// Read the next line
line = reader.ReadLine();
}
}
}
}
Ability to extract data from the form fields of PDFs
Description
This feature allows extracting data from PDF Forms.
Public API changes
- Added GetFormData method to PdfTextExtractor class
Usage
C#
// Create a text extractor
using (var extractor = new PdfTextExtractor(fileName))
{
// Extract forms data
var fields = extractor.GetFormData();
// Iterate over fields
foreach (var f in fields)
{
// Print field name and value
System.Console.WriteLine(string.Format("{0}: {1}", f.Key, f.Value));
}
}