GroupDocs.Parser Overview

GroupDocs.Parser is a robust cross-platform library that enables you to integrate into your applications the feature of extracting text, images, metadata, and structured data from different document formats. The API provides comprehensive parsing capabilities including template-based extraction for structured documents, raw and formatted text extraction, and container processing.

The API provides the following features:

  • GroupDocs.Parser supports 50+ file formats without any additional software
  • Wide range of options to customize the extraction process
  • Information extraction – file type, page count, etc.
  • Text extraction in raw and formatted modes
  • Template-based structured data extraction
  • Metadata, images, tables, and attachments extraction
  • Container processing (ZIP archives, email attachments, PDF portfolios)

You can use GroupDocs.Parser across multiple platforms and operation systems:

  • Windows, Linux, and macOS
  • Python versions 3.5–3.13 supported

Get started with GroupDocs.Parser for Python via .NET

If you are new to GroupDocs.Parser, see the following topics first:

Technical support

If you encounter an issue while using GroupDocs.Parser or have a technical question, feel free to create a post in our Free Support Forum. If free support is not sufficient, you can submit a ticket to our Paid Support Helpdesk.