GroupDocs.Parser is a robust cross-platform library that enables you to integrate into your applications the feature of extracting text, images, metadata, and structured data from different document formats. The API provides comprehensive parsing capabilities including template-based extraction for structured documents, raw and formatted text extraction, and container processing.
The API provides the following features:
GroupDocs.Parser supports 50+ file formats without any additional software
Wide range of options to customize the extraction process
Information extraction – file type, page count, etc.
Text extraction in raw and formatted modes
Template-based structured data extraction
Metadata, images, tables, and attachments extraction
Container processing (ZIP archives, email attachments, PDF portfolios)
You can use GroupDocs.Parser across multiple platforms and operation systems:
Windows, Linux, and macOS
Python versions 3.5–3.13 supported
Get started with GroupDocs.Parser for Python via .NET
If you are new to GroupDocs.Parser, see the following topics first:
If you encounter an issue while using GroupDocs.Parser or have a technical question, feel free to create a post in our Free Support Forum. If free support is not sufficient, you can submit a ticket to our Paid Support Helpdesk.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.