Document template is set by Template class. It contains template items - fields and tables. Each item has the unique (in the template bounds) name and optional page index - value that represents the index of the page where the template item is located; null if the template item is located on any page.
Template fields
The template field is set by TemplateField class with the following constructor:
The page index. An integer value that represents the index of the page where the template item is located; null if the template item is located on any page.
TemplatePosition is an abstract base class. The following classes are used to set template positions:
TemplateFixedPosition. Provides a template field position which is defined by the rectangular area.
TemplateRegexPosition. Provides a template field position which uses the regular expression.
This is simplest way to define the field position. It requires to set a rectangular area on the page that bounds the field value. All the text that is contained (even partially) into the rectangular area will be extracted as a value:
// Create a fixed template field with "Address" name which is bounded by a rectangle// at the position (35, 160) and with the size (110, 20)TemplateFieldtemplateField=newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,160),newSize(110,20))),"Address");
It is recommended to define a rectangular area above (below) the center of the line that is below (above) the selected area, in order to avoid the excessive extraction of the text. For example:
Template definition
Result
)
Extracts only one line
67890
)
Extracts two lines
4321 First Street
Anytown, State ZIP
)
Extracts four lines
Company Name
4321 First Street
Anytown, State ZIP
Date: 06/02/2019
TemplateRegexPosition
This way to define the field position allows to find a field value by a regular expression. For example, if the document contains “Invoice Number INV-12345” then template field can be defined in the following way:
// Create a regex template field with "InvoiceNumber" nameTemplateFieldtemplateField=newTemplateField(newTemplateRegexPosition("Invoice Number\\s+[A-Z0-9\\-]+"),"InvoiceNumber");
In this case as a value the entire string is extracted. To extract only a part of the string the regular expression group “value” is used:
// Create a regex template field with "InvoiceNumber" name with "value" groupTemplateFieldtemplateField=newTemplateField(newTemplateRegexPosition("Invoice Number\\s+(?<value>[A-Z0-9\\-]+)"),"InvoiceNumber");
In this case as a value “INV-3337” string is extracted.
Regular expression fields can be used as linked fields.
TemplateLinkedPosition
This way to define the field position allows to find a field value by extracting a rectangular area around the linked field. For example, if it’s known that the field with an invoice number is placed on the right of “Invoice number” string the following code is used:
// Create a regex template field to find "Invoice Number" textTemplateFieldinvoice=newTemplateField(newTemplateRegexPosition("Invoice Number"),"Invoice");// Create a related template field associated with "Invoice" field and extract the value on the right of itTemplateFieldinvoiceNumber=newTemplateField(newTemplateLinkedPosition("invoice",newSize(100,15),newTemplateLinkedPositionEdges(false,false,true,false)),"InvoiceNumber");
Template definition
Result
)
Extracts a text on the right of “Invoice Number” field:INV-3337
To simplify the setting of the size of template field AutoScale property is used. The size of template field is scaled according to the related field if AutoScale is set to true. This is useful when the font size is not known in advance, but the proportions of the size of the value (the ratio of height to width) are approximately known:
// Create a regex template field to find "Invoice Number" textTemplateFieldinvoice=newTemplateField(newTemplateRegexPosition("Invoice Number"),"Invoice");// Create a related template field associated with "Invoice" field and extract the value on the right of itTemplateFieldinvoiceNumber=newTemplateField(newTemplateLinkedPosition("invoice",newSize(100,15),newTemplateLinkedPositionEdges(false,false,true,false),true),"InvoiceNumber");
Template definition
Result
)
Extracts a text on the right of “Invoice Number” field:INV-3337
The field value can be extracted from either side of the related field. The side of the value extraction is set by Edges property. The size of rectangular area is set by SearchArea property. The position of rectangular area depends on the side of the value extraction:
The related field can be any field which was previously defined in the template:
// Create a regex template fieldTemplateFieldfromField=newTemplateField(newTemplateRegexPosition("From"),"From",0);// Create a related template field linked to "From" regex field and placed under itTemplateFieldcompanyField=newTemplateField(newTemplateLinkedPosition("From",newSize(100,10),newTemplateLinkedPositionEdges(false,false,false,true)),"FromCompany",0);// Create a related template field linked to "FromCompany" related field and placed under itTemplateFieldaddressField=newTemplateField(newTemplateLinkedPosition("FromCompany",newSize(100,30),newTemplateLinkedPositionEdges(false,false,false,true)),"FromAddress",0);
Template definition
Result
)
The extraction is processed in the following way:Extracts data of “From” regex field (green
Extracts data of “FromCompany” related field (yellow)
Extracts data of “FromAddress” related field (red)
A value of the field depends on the related field. The field is always empty if the related field doesn’t have a value. If the field has a value then it has a link to the related field.
Document template with fields
An instance of Template class is created by the constructor:
Template(IEnumerable<TemplateItem>items)
This constructor accepts a collection of template items:
// Create an array of template fieldsTemplateItem[]fields=newTemplateItem[]{newTemplateField(newTemplateRegexPosition("From"),"From",0),newTemplateField(newTemplateLinkedPosition("From",newSize(100,10),newTemplateLinkedPositionEdges(false,false,false,true)),"FromCompany",0),newTemplateField(newTemplateLinkedPosition("FromCompany",newSize(100,30),newTemplateLinkedPositionEdges(false,false,false,true)),"FromAddress",0)};// Create a document templateTemplatetemplate=newTemplate(fields);
The field name is case-insensitive (Field and FIELD - the same names) and must be unique in the template. The related field must be associated with the early defined field. If these conditions don’t meet, the exception is thrown.
Template tables
Template table is set by TemplateTable class with the following constructors:
Template table can be set by detector parameters or table layout. If the page index is omitted, tables are extracted from every document page. It’s useful in the cases when the document contains pages with the same layout (pages differ only by data).
If a template table is set by detector parameters, the table is detected automatically:
TemplateTableParametersparameters=newTemplateTableParameters(newRectangle(newPoint(175,350),newSize(400,200)),newdouble[]{185,370,425,485,545});TemplateTabletable=newTemplateTable(parameters,"Details",0);// Create a document templateTemplatetemplate=newTemplate(newTemplateItem[]{table});
Template table is set by table layout if the table can’t be detected automatically:
For example, a document has tables on each page (or a set of documents with a table on the page). These tables differ by position and content, but have the same columns and rows. In this case a user can define TemplateTableLayout object at (0, 0) once and then move it to the location of the definite table.
If the table position depends on the other object of the page, a user can define TemplateTableLayout object based on template document and then move it according to an anchor object. For example, if this is a summary table and it is followed by details table (which can contain a different count of rows). In this case a user can define TemplateTableLayout object on template document (with the known details table rectangle) and then move TemplateTableLayout object according to the difference of details table rectangle of template and real document.
MoveTo method returns a copy of the current object. A user can pass any coordinates (even negative - then layout will be moved to the left/top).
Template barcodes
Template barcodes work in the same way as a template field with the fixed position. The following example shows how to define a template barcode field:
// Define a barcode fieldTemplateBarcodebarcode=newTemplateBarcode(newRectangle(newPoint(590,80),newSize(150,150)),"QR");// Create a templateTemplatetemplate=newTemplate(newTemplateItem[]{barcode});// Create an instance of Parser classusing(Parserparser=newParser(Constants.SamplePdfWithBarcodes)){// Parse the document by the templateDocumentDatadata=parser.ParseByTemplate(template);// Print all extracted datafor(inti=0;i<data.Count;i++){Console.Write(data[i].Name+": ");PageBarcodeAreaarea=data[i].PageAreaasPageBarcodeArea;Console.WriteLine(area==null?"Not a template barcode field":area.Value);}}
Complex template example
This example shows the template which is used to parse the following invoice:
// Create detector parameters for "Details" tableTemplateTableParametersdetailsTableParameters=newTemplateTableParameters(newRectangle(newPoint(35,320),newSize(530,55)),null);// Create detector parameters for "Summary" tableTemplateTableParameterssummaryTableParameters=newTemplateTableParameters(newRectangle(newPoint(330,385),newSize(220,65)),null);// Create a collection of template itemsTemplateItem[]templateItems=newTemplateItem[]{newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,135),newSize(100,10))),"FromCompany"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,150),newSize(100,35))),"FromAddress"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,190),newSize(150,2))),"FromEmail"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,250),newSize(100,2))),"ToCompany"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,260),newSize(100,15))),"ToAddress"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,290),newSize(150,2))),"ToEmail"),newTemplateField(newTemplateRegexPosition("Invoice Number"),"InvoiceNumber"),newTemplateField(newTemplateLinkedPosition("InvoiceNumber",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"InvoiceNumberValue"),newTemplateField(newTemplateRegexPosition("Order Number"),"InvoiceOrder"),newTemplateField(newTemplateLinkedPosition("InvoiceOrder",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"InvoiceOrderValue"),newTemplateField(newTemplateRegexPosition("Invoice Date"),"InvoiceDate"),newTemplateField(newTemplateLinkedPosition("InvoiceDate",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"InvoiceDateValue"),newTemplateField(newTemplateRegexPosition("Due Date"),"DueDate"),newTemplateField(newTemplateLinkedPosition("DueDate",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"DueDateValue"),newTemplateField(newTemplateRegexPosition("Total Due"),"TotalDue"),newTemplateField(newTemplateLinkedPosition("TotalDue",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"TotalDueValue"),newTemplateTable(detailsTableParameters,"details",null),newTemplateTable(summaryTableParameters,"summary",null)};// Create a document templateTemplatetemplate=newTemplate(templateItems);
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples:
Along with full featured .NET library we provide simple, but powerful free Apps.
You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.