Use template-based parsing to pull structured values (like invoice numbers, dates, tables) from documents.
Parse data with a template
fromgroupdocs.parserimportParserfromgroupdocs.parser.templatesimportTemplate,TemplateField,TemplateFixedPositionfromgroupdocs.parser.dataimportRectangle,Point,Sizetemplate=Template([TemplateField(TemplateFixedPosition(Rectangle(Point(35.0,135.0),Size(100.0,10.0))),"CompanyName",),TemplateField(TemplateFixedPosition(Rectangle(Point(35.0,150.0),Size(100.0,10.0))),"InvoiceNumber",),])withParser("./invoice.pdf")asparser:data=parser.parse_by_template(template)ifdataisNone:print("Parsing by template isn't supported for this format.")else:# DocumentData exposes extracted fields as an indexed collection.foriinrange(data.count):field=data[i]area=field.page_areatext=getattr(area,"text",None)print(f"{field.name}: {textiftextisnotNoneelse'[non-text value]'}")
The following sample file is used in this example: invoice.pdf
Steps
Build a Template with fields or tables (fixed positions, regex, linked positions, etc.).
Call parser.parse_by_template(template) to receive DocumentData.
Iterate through extracted fields by index (data.count and data[i]) and process each extracted value based on its type (text, table, etc.).