GroupDocs.Parser provides Document Parser feature that allows you to extract data from documents of various formats including PDF, Microsoft Word, Excel, LibreOffice formats etc. (see full supported list).
With Document Parsing feature you can easily solve business automation tasks with the data extracted from your documents.
Using this feature is straightforward. Simply define a template programmatically and apply it.
Parse data from documents
GroupDocs.Parser provides the functionality to extract basic metadata from documents by the parseByTemplate(Template) method:
DocumentDataparseByTemplate(Templatetemplate);
This method parses data from the document by a user-generated template.
Here are the steps to parse data from the document by user-generated template:
Instantiate Parser object for the initial document;
Instantiate Template object with the user-generated template;
Check if data isn’t null (parse by template is supported for the document);
Iterate over field data to obtain form data.
The following example shows how to parse data from the document by user-generated template :
// Create an instance of Parser class
try(Parserparser=newParser(Constants.SampleInvoicePdf)){// Parse the document by the template
DocumentDatadata=parser.parseByTemplate(GetTemplate());// Check if form extraction is supported
if(data==null){System.out.println("Parse Document by Template isn't supported.");return;}// Print extracted fields
for(inti=0;i<data.getCount();i++){System.out.print(data.get(i).getName()+": ");PageTextAreaarea=data.get(i).getPageArea()instanceofPageTextArea?(PageTextArea)data.get(i).getPageArea():null;System.out.println(area==null?"Not a template field":area.getText());}}privatestaticTemplateGetTemplate(){// Create detector parameters for "Details" table
TemplateTableParametersdetailsTableParameters=newTemplateTableParameters(newRectangle(newPoint(35,320),newSize(530,55)),null);// Create detector parameters for "Summary" table
TemplateTableParameterssummaryTableParameters=newTemplateTableParameters(newRectangle(newPoint(330,385),newSize(220,65)),null);// Create a collection of template items
TemplateItem[]templateItems=newTemplateItem[]{newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,135),newSize(100,10))),"FromCompany"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,150),newSize(100,35))),"FromAddress"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,190),newSize(150,2))),"FromEmail"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,250),newSize(100,2))),"ToCompany"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,260),newSize(100,15))),"ToAddress"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,290),newSize(150,2))),"ToEmail"),newTemplateField(newTemplateRegexPosition("Invoice Number"),"InvoiceNumber"),newTemplateField(newTemplateLinkedPosition("InvoiceNumber",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"InvoiceNumberValue"),newTemplateField(newTemplateRegexPosition("Order Number"),"InvoiceOrder"),newTemplateField(newTemplateLinkedPosition("InvoiceOrder",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"InvoiceOrderValue"),newTemplateField(newTemplateRegexPosition("Invoice Date"),"InvoiceDate"),newTemplateField(newTemplateLinkedPosition("InvoiceDate",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"InvoiceDateValue"),newTemplateField(newTemplateRegexPosition("Due Date"),"DueDate"),newTemplateField(newTemplateLinkedPosition("DueDate",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"DueDateValue"),newTemplateField(newTemplateRegexPosition("Total Due"),"TotalDue"),newTemplateField(newTemplateLinkedPosition("TotalDue",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"TotalDueValue"),newTemplateTable(detailsTableParameters,"details",null),newTemplateTable(summaryTableParameters,"summary",null)};// Create a document template
Templatetemplate=newTemplate(java.util.Arrays.asList(templateItems));returntemplate;}
More resources
Advanced usage topics
To learn more about template building and working with extracted data please refer the following guides: