GroupDocs.Parser provides Document Parser feature that allows you to extract data from documents of various formats including PDF, Microsoft Word, Excel, LibreOffice formats etc. (see full supported list)).
With Document Parsing feature you can easily solve business automation tasks with the data extracted from your documents.
Using this feature is straightforward. Simply define a template programmatically and apply it.
Parse data from documents
GroupDocs.Parser provides Template object to define a template and ParseByTemplate method to apply the template to existing document.
DocumentDataParseByTemplate(Templatetemplate)
Here are the steps to parse data from the document by user-generated template:
Instantiate Parser object for the existing document;
Instantiate Template object with the user-generated template;
Check if data isn’t null (indicates that Parse by Template feature is supported for the document);
Iterate over field data to obtain the document data.
The following example shows how to parse data from the document by user-generated template :
// Create an instance of Parser classusing(Parserparser=newParser(Constants.SampleInvoicePdf)){// Parse the document by the templateDocumentDatadata=parser.ParseByTemplate(GetTemplate());// Check if parsing document by template is supportedif(data==null){Console.WriteLine("Parsing Document by Template isn't supported.");return;}// Print extracted fieldsfor(inti=0;i<data.Count;i++){Console.Write(data[i].Name+": ");PageTextAreaarea=data[i].PageAreaasPageTextArea;Console.WriteLine(area==null?"Not a template field":area.Text);}}privatestaticTemplateGetTemplate(){// Create detector parameters for "Details" tableTemplateTableParametersdetailsTableParameters=newTemplateTableParameters(newRectangle(newPoint(35,320),newSize(530,55)),null);// Create detector parameters for "Summary" tableTemplateTableParameterssummaryTableParameters=newTemplateTableParameters(newRectangle(newPoint(330,385),newSize(220,65)),null);// Create a collection of template itemsTemplateItem[]templateItems=newTemplateItem[]{newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,135),newSize(100,10))),"FromCompany"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,150),newSize(100,35))),"FromAddress"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,190),newSize(150,2))),"FromEmail"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,250),newSize(100,2))),"ToCompany"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,260),newSize(100,15))),"ToAddress"),newTemplateField(newTemplateFixedPosition(newRectangle(newPoint(35,290),newSize(150,2))),"ToEmail"),newTemplateField(newTemplateRegexPosition("Invoice Number"),"InvoiceNumber"),newTemplateField(newTemplateLinkedPosition("InvoiceNumber",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"InvoiceNumberValue"),newTemplateField(newTemplateRegexPosition("Order Number"),"InvoiceOrder"),newTemplateField(newTemplateLinkedPosition("InvoiceOrder",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"InvoiceOrderValue"),newTemplateField(newTemplateRegexPosition("Invoice Date"),"InvoiceDate"),newTemplateField(newTemplateLinkedPosition("InvoiceDate",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"InvoiceDateValue"),newTemplateField(newTemplateRegexPosition("Due Date"),"DueDate"),newTemplateField(newTemplateLinkedPosition("DueDate",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"DueDateValue"),newTemplateField(newTemplateRegexPosition("Total Due"),"TotalDue"),newTemplateField(newTemplateLinkedPosition("TotalDue",newSize(200,15),newTemplateLinkedPositionEdges(false,false,true,false)),"TotalDueValue"),newTemplateTable(detailsTableParameters,"details",null),newTemplateTable(summaryTableParameters,"summary",null)};// Create a document templateTemplatetemplate=newTemplate(templateItems);returntemplate;}
More resources
Advanced usage topics
To learn more about template building and working with extracted data please refer the following guides:
Along with full featured .NET library we provide simple, but powerful free Apps.
You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.