This topic lists file formats supported by GroupDocs.Parser for Python via .NET . You can use the input below to filter supported formats by extension.
Can’t find your file format? We’re here to help! Please post a request on our
Free Support Forum , and our team will assist you.
Format Description Extract Text Extract Metadata Extract Images Extract Tables DOC Microsoft Word Document ✅ ✅ ✅ ✅ DOCX Office Open XML Document ✅ ✅ ✅ ✅ DOCM Office Open XML Macro-Enabled Document ✅ ✅ ✅ ✅ DOT Microsoft Word Document Template ✅ ✅ ✅ ✅ DOTX Office Open XML Document Template ✅ ✅ ✅ ✅ DOTM Office Open XML Document Macro-Enabled Template ✅ ✅ ✅ ✅ TXT Plain text ✅ ODT Open Document Text ✅ ✅ ✅ ✅ OTT Open Document Text Template ✅ ✅ ✅ ✅ RTF Rich Text Format ✅ ✅ ✅ ✅
Format Description Extract Text Extract Metadata Extract Images Extract Tables PDF Portable Document Format File ✅ ✅ ✅ ✅
Format Description Extract Text Extract Metadata Extract Images Extract Tables HTML Hypertext Markup Language Format ✅ ✅ XHTML Extensible Hypertext Markup Language File ✅ ✅ MHTML MIME HTML File ✅ ✅ MD Markdown ✅ XML XML File ✅
Format Description Extract Text Extract Metadata Extract Images Extract Tables CHM Compiled HTML Help File ✅ ✅ EPUB Digital E-Book File Format ✅ ✅ FB2 FictionBook 2.0 File ✅ ✅ MOBI Mobipocket ✅
Format Description Extract Text Extract Metadata Extract Images Extract Tables XLS Microsoft Excel Spreadsheet ✅ ✅ ✅ ✅ XLSX Office Open XML Spreadsheet ✅ ✅ ✅ ✅ XLSM Office Open XML Macro-Enabled Spreadsheet ✅ ✅ ✅ ✅ XLSB Office Open XML Binary Spreadsheet ✅ ✅ ✅ ✅ XLT Microsoft Excel Template ✅ ✅ ✅ ✅ XLTX Office Open XML Spreadsheet Template ✅ ✅ ✅ ✅ XLTM Office Open XML Macro-Enabled Spreadsheet Template ✅ ✅ ✅ ✅ ODS Open Document Spreadsheet ✅ ✅ ✅ ✅ CSV Comma Separated Values ✅
Format Description Extract Text Extract Metadata Extract Images Extract Tables PPT PowerPoint Presentation ✅ ✅ ✅ ✅ PPTX Office Open XML Presentation ✅ ✅ ✅ ✅ PPTM Office Open XML Macro-Enabled Presentation ✅ ✅ ✅ ✅ PPS PowerPoint Slideshow ✅ ✅ ✅ ✅ PPSX Office Open XML Presentation Slideshow ✅ ✅ ✅ ✅ PPSM Office Open XML Macro-Enabled Presentation Slideshow ✅ ✅ ✅ ✅ POT PowerPoint Template ✅ ✅ ✅ ✅ POTX Office Open XML Presentation Template ✅ ✅ ✅ ✅ POTM Office Open XML Macro-Enabled Presentation Template ✅ ✅ ✅ ✅ ODP Open Document Presentation ✅ ✅ ✅ ✅ OTP Open Document Presentation Template ✅ ✅ ✅ ✅
Format Description Extract Text Extract Metadata Extract Images Extract Tables PST Outlook Personal Information Store File OST Outlook Offline Data File EML E-Mail Message ✅ ✅ ✅ EMLX Apple Mail Message ✅ ✅ ✅ MSG Outlook Mail Message ✅ ✅ ✅
Format Description Extract Text Extract Metadata Extract Images Extract Tables ZIP Zipped File ✅* ✅* RAR Rar File ✅* ✅* TAR Tar File ✅* ✅* GZ GZip file ✅* ✅* BZ2 BZip2 File ✅* ✅*
* Extraction from files within the archive
Format Description Extract Text (OCR) Extract Metadata BMP Bitmap Image file ✅ JPG, JPEG JPEG Image file ✅ PNG Portable Network Graphics ✅ TIF, TIFF Tagged Image File Format ✅ GIF Graphical Interchange Format DJVU DjVu File Format ✅
Note: Text extraction from images requires OCR functionality. Basic OCR support is available, but for advanced scenarios, you may need to configure additional OCR providers.
Format Description Extract Text Extract Metadata Extract Images Extract Tables ONE OneNote Document ✅
GroupDocs.Parser for Python via .NET supports 50+ document formats across various categories including office documents, PDFs, emails, archives, and images. The library provides comprehensive data extraction capabilities including text, metadata, images, and tables depending on the format.
For specific format support and feature availability, please refer to the detailed tables above or consult the API Reference .
We value your opinion. Your feedback will help us improve our documentation.