Document template is set by Template class. It contains template items - fields and tables. Each item has the unique (in the template bounds) name and optional page index - value that represents the index of the page where the template item is located; null if the template item is located on any page.
Template fields
The template field is set by TemplateField class with the following constructor:
The page index. An integer value that represents the index of the page where the template item is located; null if the template item is located on any page.
TemplatePosition is an abstract base class. The following classes are used to set template positions:
TemplateFixedPosition. Provides a template field position which is defined by the rectangular area.
TemplateRegexPosition. Provides a template field position which uses the regular expression.
This is simplest way to define the field position. It requires to set a rectangular area on the page that bounds the field value. All the text that is contained (even partially) into the rectangular area will be extracted as a value:
// Create a fixed template field with "Address" name which is bounded by a rectangle at the position (35, 160) and with the size (110, 20)
TemplateFieldtemplateField=newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,160),newSize(110,20))),"Address");
It is recommended to define a rectangular area above (below) the center of the line that is below (above) the selected area, in order to avoid the excessive extraction of the text. For example:
Template definition
Result
Extracts only one line:
67890
)
Extracts two lines
4321 First Street
Anytown, State ZIP
)
Extracts four lines
Company Name
4321 First Street
Anytown, State ZIP
Date: 06/02/2019
TemplateRegexPosition
This way to define the field position allows to find a field value by a regular expression. For example, if the document contains “Invoice Number INV-12345” then template field can be defined in the following way:
// Create a regex template field with "InvoiceNumber" name
TemplateFieldtemplateField=newTemplateField(newTemplateRegexPosition("Invoice Number\\s+[A-Z0-9\\-]+"),"InvoiceNumber");
In this case as a value the entire string is extracted. To extract only a part of the string the regular expression group “value” is used:
// Create a regex template field with "InvoiceNumber" name with "value" group
TemplateFieldtemplateField=newTemplateField(newTemplateRegexPosition("Invoice Number\\s+(?<value>[A-Z0-9\\-]+)"),"InvoiceNumber");
In this case as a value “INV-3337” string is extracted.
Regular expression fields can be used as linked fields.
TemplateLinkedPosition
This way to define the field position allows to find a field value by extracting a rectangular area around the linked field. For example, if it’s known that the field with an invoice number is placed on the right of “Invoice number” string the following code is used:
// Create a regex template field to find "Invoice Number" text
TemplateFieldinvoice=newTemplateField(newTemplateRegexPosition("Invoice Number"),"Invoice");// Create a related template field associated with "Invoice" field and extract the value on the right of it
TemplateFieldinvoiceNumber=newTemplateField(newTemplateLinkedPosition("invoice",newSize(100,15),newTemplateLinkedPositionEdges(false,false,true,false)),"InvoiceNumber");
Template definition
Result
)
Extracts a text on the right of “Invoice Number” field:INV-3337
To simplify the setting of the size of template field isAutoScale property is used. The size of template field is scaled according to the related field if isAutoScale is set to true. This is useful when the font size is not known in advance, but the proportions of the size of the value (the ratio of height to width) are approximately known:
// Create a regex template field to find "Invoice Number" text
TemplateFieldinvoice=newTemplateField(newTemplateRegexPosition("Invoice Number"),"Invoice");// Create a related template field associated with "Invoice" field and extract the value on the right of it
TemplateFieldinvoiceNumber=newTemplateField(newTemplateLinkedPosition("invoice",newSize(100,15),newTemplateLinkedPositionEdges(false,false,true,false),true),"InvoiceNumber");
Template definition
Result
)
Extracts a text on the right of “Invoice Number” field:INV-3337
The field value can be extracted from either side of the related field. The side of the value extraction is set by getEdges property. The size of rectangular area is set by getSearchArea property. The position of rectangular area depends on the side of the value extraction:
The related field can be any field which was previously defined in the template:
// Create a regex template field
TemplateFieldfromField=newTemplateField(newTemplateRegexPosition("From"),"From",0);// Create a related template field linked to "From" regex field and placed under it
TemplateFieldcompanyField=newTemplateField(newTemplateLinkedPosition("From",newSize(100,10),newTemplateLinkedPositionEdges(false,false,false,true)),"FromCompany",0);// Create a related template field linked to "FromCompany" related field and placed under it
TemplateFieldaddressField=newTemplateField(newTemplateLinkedPosition("FromCompany",newSize(100,30),newTemplateLinkedPositionEdges(false,false,false,true)),"FromAddress",0);
Template definition
Result
)
The extraction is processed in the following way:Extracts data of “From” regex field (green
Extracts data of “FromCompany” related field (yellow)
Extracts data of “FromAddress” related field (red)
A value of the field depends on the related field. The field is always empty if the related field doesn’t have a value. If the field has a value then it has a link to the related field.
Document template with fields
An instance of Template class is created by the constructor:
Template(IEnumerable<TemplateItem>items)
This constructor accepts a collection of template items:
// Create an array of template fields
TemplateItem[]fields=newTemplateItem[]{newTemplateField(newTemplateRegexPosition("From"),"From",0),newTemplateField(newTemplateLinkedPosition("From",newSize(100,10),newTemplateLinkedPositionEdges(false,false,false,true)),"FromCompany",0),newTemplateField(newTemplateLinkedPosition("FromCompany",newSize(100,30),newTemplateLinkedPositionEdges(false,false,false,true)),"FromAddress",0)};// Create a document template
Templatetemplate=newTemplate(java.util.Arrays.asList(fields));
The field name is case-insensitive (Field and FIELD - the same names) and must be unique in the template. The related field must be associated with the early defined field. If these conditions don’t meet, the exception is thrown.
Template tables
Template table is set by TemplateTable class with the following constructors:
Template table can be set by detector parameters or table layout. If the page index is omitted, tables are extracted from every document page. It’s useful in the cases when the document contains pages with the same layout (pages differ only by data).
For example, a document has tables on each page (or a set of documents with a table on the page). These tables differ by position and content, but have the same columns and rows. In this case a user can define TemplateTableLayout object at (0, 0) once and then move it to the location of the definite table.
If the table position depends on the other object of the page, a user can define TemplateTableLayout object based on template document and then move it according to an anchor object. For example, if this is a summary table and it is followed by details table (which can contain a different count of rows). In this case a user can define TemplateTableLayout object on template document (with the known details table rectangle) and then move TemplateTableLayout object according to the difference of details table rectangle of template and real document.
moveTo(Point) method returns a copy of the current object. A user can pass any coordinates (even negative - then layout will be moved to the left/top).
Template barcodes
Template barcodes work in the same way as a template field with the fixed position. The following example shows how to define a template barcode field:
// Define a barcode field
TemplateBarcodebarcode=newTemplateBarcode(newRectangle(newPoint(590,80),newSize(150,150)),"QR");// Create a template
Templatetemplate=newTemplate(Arrays.asList(newTemplateItem[]{barcode}));// Create an instance of Parser class
try(Parserparser=newParser(Constants.SamplePdfWithBarcodes)){// Parse the document by the template
DocumentDatadata=parser.parseByTemplate(template);// Print all extracted data
for(inti=0;i<data.getCount();i++){// Print field name
System.out.print(data.get(i).getName()+": ");// As we have defined only barcode fields in the template,
// we cast PageArea property value to PageBarcodeArea
PageBarcodeAreaarea=data.get(i).getPageArea()instanceofPageBarcodeArea?(PageBarcodeArea)data.get(i).getPageArea():null;System.out.println(area==null?"Not a template barcode field":area.getValue());}}
Complex template example
This example shows the template which is used to parse the following invoice:
// Create detector parameters for "Details" table
TemplateTableParametersdetailsTableParameters=newTemplateTableParameters(newRectangle(newPoint(35,320),newSize(530,55)),null);// Create detector parameters for "Summary" table
TemplateTableParameterssummaryTableParameters=newTemplateTableParameters(newRectangle(newPoint(330,385),newSize(220,65)),null);// Create a collection of template items
TemplateItem[]templateItems=newTemplateItem[]{newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,135),newSize(100,10))),"FromCompany"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,150),newSize(100,35))),"FromAddress"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,190),newSize(150,2))),"FromEmail"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,250),newSize(100,2))),"ToCompany"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,260),newSize(100,15))),"ToAddress"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,290),newSize(150,2))),"ToEmail"),newTemplateField(newTemplateRegexPosition("Invoice Number"),"InvoiceNumber"),newTemplateField(newTemplateLinkedPosition("InvoiceNumber",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"InvoiceNumberValue"),newTemplateField(newTemplateRegexPosition("Order Number"),"InvoiceOrder"),newTemplateField(newTemplateLinkedPosition("InvoiceOrder",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"InvoiceOrderValue"),newTemplateField(newTemplateRegexPosition("Invoice Date"),"InvoiceDate"),newTemplateField(newTemplateLinkedPosition("InvoiceDate",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"InvoiceDateValue"),newTemplateField(newTemplateRegexPosition("Due Date"),"DueDate"),newTemplateField(newTemplateLinkedPosition("DueDate",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"DueDateValue"),newTemplateField(newTemplateRegexPosition("Total Due"),"TotalDue"),newTemplateField(newTemplateLinkedPosition("TotalDue",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"TotalDueValue"),newTemplateTable(detailsTableParameters,"details",null),newTemplateTable(summaryTableParameters,"summary",null)};// Create a document template
Templatetemplate=newTemplate(java.util.Arrays.asList(templateItems));
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples: