Supported File Formats

This topic lists file formats supported by GroupDocs.Parser for Python via .NET. You can use the input below to filter supported formats by extension.

Word Processing

FormatDescriptionExtract TextExtract MetadataExtract ImagesExtract Tables
DOCMicrosoft Word Document
DOCXOffice Open XML Document
DOCMOffice Open XML Macro-Enabled Document
DOTMicrosoft Word Document Template
DOTXOffice Open XML Document Template
DOTMOffice Open XML Document Macro-Enabled Template
TXTPlain text
ODTOpen Document Text
OTTOpen Document Text Template
RTFRich Text Format

PDF

FormatDescriptionExtract TextExtract MetadataExtract ImagesExtract Tables
PDFPortable Document Format File

Markup

FormatDescriptionExtract TextExtract MetadataExtract ImagesExtract Tables
HTMLHypertext Markup Language Format
XHTMLExtensible Hypertext Markup Language File
MHTMLMIME HTML File
MDMarkdown
XMLXML File

Ebook

FormatDescriptionExtract TextExtract MetadataExtract ImagesExtract Tables
CHMCompiled HTML Help File
EPUBDigital E-Book File Format
FB2FictionBook 2.0 File
MOBIMobipocket

Spreadsheet

FormatDescriptionExtract TextExtract MetadataExtract ImagesExtract Tables
XLSMicrosoft Excel Spreadsheet
XLSXOffice Open XML Spreadsheet
XLSMOffice Open XML Macro-Enabled Spreadsheet
XLSBOffice Open XML Binary Spreadsheet
XLTMicrosoft Excel Template
XLTXOffice Open XML Spreadsheet Template
XLTMOffice Open XML Macro-Enabled Spreadsheet Template
ODSOpen Document Spreadsheet
CSVComma Separated Values

Presentation

FormatDescriptionExtract TextExtract MetadataExtract ImagesExtract Tables
PPTPowerPoint Presentation
PPTXOffice Open XML Presentation
PPTMOffice Open XML Macro-Enabled Presentation
PPSPowerPoint Slideshow
PPSXOffice Open XML Presentation Slideshow
PPSMOffice Open XML Macro-Enabled Presentation Slideshow
POTPowerPoint Template
POTXOffice Open XML Presentation Template
POTMOffice Open XML Macro-Enabled Presentation Template
ODPOpen Document Presentation
OTPOpen Document Presentation Template

Email

FormatDescriptionExtract TextExtract MetadataExtract ImagesExtract Tables
PSTOutlook Personal Information Store File
OSTOutlook Offline Data File
EMLE-Mail Message
EMLXApple Mail Message
MSGOutlook Mail Message

Archive

FormatDescriptionExtract TextExtract MetadataExtract ImagesExtract Tables
ZIPZipped File✅*✅*
RARRar File✅*✅*
TARTar File✅*✅*
GZGZip file✅*✅*
BZ2BZip2 File✅*✅*

* Extraction from files within the archive

Image

FormatDescriptionExtract Text (OCR)Extract Metadata
BMPBitmap Image file
JPG, JPEGJPEG Image file
PNGPortable Network Graphics
TIF, TIFFTagged Image File Format
GIFGraphical Interchange Format
DJVUDjVu File Format

Note: Text extraction from images requires OCR functionality. Basic OCR support is available, but for advanced scenarios, you may need to configure additional OCR providers.

Note-taking

FormatDescriptionExtract TextExtract MetadataExtract ImagesExtract Tables
ONEOneNote Document

Summary

GroupDocs.Parser for Python via .NET supports 50+ document formats across various categories including office documents, PDFs, emails, archives, and images. The library provides comprehensive data extraction capabilities including text, metadata, images, and tables depending on the format.

For specific format support and feature availability, please refer to the detailed tables above or consult the API Reference.