Check if data isn’t null (parse form is supported for the document);
Iterate over field data to obtain form data.
The following example shows the use case when a user fills in PDF form and send it by email (for example). The software opens this PDF and extracts the preliminary record:
// Create an instance of Parser class
try(Parserparser=newParser(Constants.SampleCarWashPdf)){// Extract data from PDF document
DocumentDatadata=parser.parseForm();// Check if form extraction is supported
if(data==null){System.out.println("Form extraction isn't supported.");return;}// Create the preliminary record object
PreliminaryRecordrec=newPreliminaryRecord();rec.Name=getFieldText(data,"Name");rec.Model=getFieldText(data,"Model");rec.Time=getFieldText(data,"Time");rec.Description=getFieldText(data,"Description");// We can save the preliminary record object to the database,
// send it as the web response or just print it to the console
System.out.println("Preliminary record");System.out.println(String.format("Name: %s",rec.Name));System.out.println(String.format("Model: %s",rec.Model));System.out.println(String.format("Time: %s",rec.Time));System.out.println(String.format("Description: %s",rec.Description));}privatestaticStringgetFieldText(DocumentDatadata,StringfieldName){// Get the field from data collection
FieldDatafieldData=data.getFieldsByName(fieldName).get(0);// Check if the field data is not null (a field with the fieldName is contained in data collection)
// and check if the field data contains the text
returnfieldData!=null&&fieldData.getPageArea()instanceofPageTextArea?((PageTextArea)fieldData.getPageArea()).getText():null;}/**
* Simple POCO object to store the extracted data.
*/staticclassPreliminaryRecord{publicStringName;publicStringModel;publicStringTime;publicStringDescription;}
Iterate over field data to obtain the document data.
The following example shows how to parse data from PDF document by the user-generated template:
// Create an instance of Parser class
try(Parserparser=newParser(Constants.SampleInvoicePdf)){// Parse the document by the template
DocumentDatadata=parser.parseByTemplate(GetTemplate());// Check if form extraction is supported
if(data==null){System.out.println("Parse Document by Template isn't supported.");return;}// Print extracted fields
for(inti=0;i<data.getCount();i++){System.out.print(data.get(i).getName()+": ");PageTextAreaarea=data.get(i).getPageArea()instanceofPageTextArea?(PageTextArea)data.get(i).getPageArea():null;System.out.println(area==null?"Not a template field":area.getText());}}privatestaticTemplateGetTemplate(){// Create detector parameters for "Details" table
TemplateTableParametersdetailsTableParameters=newTemplateTableParameters(newRectangle(newPoint(35,320),newSize(530,55)),null);// Create detector parameters for "Summary" table
TemplateTableParameterssummaryTableParameters=newTemplateTableParameters(newRectangle(newPoint(330,385),newSize(220,65)),null);// Create a collection of template items
TemplateItem[]templateItems=newTemplateItem[]{newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,135),newSize(100,10))),"FromCompany"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,150),newSize(100,35))),"FromAddress"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,190),newSize(150,2))),"FromEmail"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,250),newSize(100,2))),"ToCompany"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,260),newSize(100,15))),"ToAddress"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,290),newSize(150,2))),"ToEmail"),newTemplateField(newTemplateRegexPosition("Invoice Number"),"InvoiceNumber"),newTemplateField(newTemplateLinkedPosition("InvoiceNumber",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"InvoiceNumberValue"),newTemplateField(newTemplateRegexPosition("Order Number"),"InvoiceOrder"),newTemplateField(newTemplateLinkedPosition("InvoiceOrder",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"InvoiceOrderValue"),newTemplateField(newTemplateRegexPosition("Invoice Date"),"InvoiceDate"),newTemplateField(newTemplateLinkedPosition("InvoiceDate",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"InvoiceDateValue"),newTemplateField(newTemplateRegexPosition("Due Date"),"DueDate"),newTemplateField(newTemplateLinkedPosition("DueDate",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"DueDateValue"),newTemplateField(newTemplateRegexPosition("Total Due"),"TotalDue"),newTemplateField(newTemplateLinkedPosition("TotalDue",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"TotalDueValue"),newTemplateTable(detailsTableParameters,"details",null),newTemplateTable(summaryTableParameters,"summary",null)};// Create a document template
Templatetemplate=newTemplate(java.util.Arrays.asList(templateItems));returntemplate;}
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples:
Along with full featured .NET library we provide simple, but powerful free Apps.
You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.