Parse Data from Documents Leave feedback

Parse data with a template

Python

from groupdocs.parser import Parser
from groupdocs.parser.templates import Template, TemplateField, TemplateFixedPosition
from groupdocs.parser.data import Rectangle, Point, Size

template = Template([
    TemplateField(
        TemplateFixedPosition(Rectangle(Point(35.0, 135.0), Size(100.0, 10.0))),
        "CompanyName",
    ),
    TemplateField(
        TemplateFixedPosition(Rectangle(Point(35.0, 150.0), Size(100.0, 10.0))),
        "InvoiceNumber",
    ),
])

with Parser("./invoice.pdf") as parser:
    data = parser.parse_by_template(template)
    if data is None:
        print("Parsing by template isn't supported for this format.")
    else:
        # DocumentData exposes extracted fields as an indexed collection.
        for i in range(data.count):
            field = data[i]
            area = field.page_area
            text = getattr(area, "text", None)
            print(f"{field.name}: {text if text is not None else '[non-text value]'}")

invoice.pdf

The following sample file is used in this example: invoice.pdf

Steps

Build a Template with fields or tables (fixed positions, regex, linked positions, etc.).
Call parser.parse_by_template(template) to receive DocumentData.
Iterate through extracted fields by index (data.count and data[i]) and process each extracted value based on its type (text, table, etc.).

For complex templates (tables, linked fields, regex fields), see the advanced templates guide.

We value your opinion. Your feedback will help us improve our documentation.

Parse Data from Documents Leave feedback

On this page

Parse data with a template

Steps

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!

On this page