Supported Document Formats
Leave feedback
The following tables indicate the file formats from which GroupDocs.Parser for Java can extract data.
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
DOC Microsoft Word Document | |||||||||||
DOT Microsoft Word Document Template | |||||||||||
DOCX Office Open XML Document | |||||||||||
DOCM Office Open XML Macro-Enabled Document | |||||||||||
DOTX Office Open XML Document Template | |||||||||||
DOTM Office Open XML Document Macro-Enabled Template | |||||||||||
TXT Plain text | |||||||||||
ODT Open Document Text | |||||||||||
OTT Open Document Text Template | |||||||||||
RTF Rich Text Format |
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
PDF Portable Document Format File |
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
XHTML Extensible Hypertext Markup Language File | |||||||||||
MHTML MIME HTML File | |||||||||||
MD Markdown | (Formatted Text is Not supported) | ||||||||||
XML XML File |
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
CHM Compiled HTML Help File | |||||||||||
EPUB Digital E-Book File Format | |||||||||||
FB2 FictionBook 2.0 File | |||||||||||
MOBI Mobipocket | |||||||||||
AZW3 Kindle Format 8 |
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
XLS Microsoft Excel Spreadsheet | |||||||||||
XLT Microsoft Excel Template | |||||||||||
XLSX Office Open XML Spreadsheet | |||||||||||
XLSM Office Open XML Macro-Enabled Spreadsheet | |||||||||||
XLSB Office Open XML Binary Spreadsheet | |||||||||||
XLTX Office Open XML Spreadsheet Template | |||||||||||
XLTM Office Open XML Macro-Enabled Spreadsheet Template | |||||||||||
ODS Open Document Spreadsheet | |||||||||||
OTS Open Document Spreadsheet Template | |||||||||||
CSV Comma Separated Values | |||||||||||
XLA Excel Add-In File | |||||||||||
XLAM Excel Open XML Macro-Enabled Add-In | |||||||||||
NUMBERS Apple iWork Numbers |
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
PPT PowerPoint Presentation | |||||||||||
PPS PowerPoint Slideshow | |||||||||||
POT PowerPoint Template | |||||||||||
PPTX Office Open XML Presentation | |||||||||||
PPTM Office Open XML Macro-Enabled Presentation | |||||||||||
POTX Office Open XML Presentation Template | |||||||||||
POTM Office Open XML Macro-Enabled Presentation Template | |||||||||||
PPSX Office Open XML Presentation Slideshow | |||||||||||
PPSM Office Open XML Macro-Enabled Presentation Slideshow | |||||||||||
ODP Open Document Presentation | |||||||||||
OTP Open Document Presentation Template |
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
PST Outlook Personal Information Store File | |||||||||||
OST Outlook Offline Data File | |||||||||||
EML E-Mail Message | |||||||||||
EMLX Apple Mail Message | |||||||||||
MSG Outlook Mail Message |
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
ONE OneNote Document |
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
ZIP Zipped File | |||||||||||
RAR Rar File | |||||||||||
TAR Tar File | |||||||||||
GZ GZip file | |||||||||||
BZ2 BZip2 File |
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
BMP Bitmap Image file | |||||||||||
GIF Graphical Interchange Format | |||||||||||
JP2 JPEG 2000 | |||||||||||
JPG, JPEG JPEG Image file | |||||||||||
PNG Portable Network Graphics | |||||||||||
TIF, TIFF Tagged Image File Format | |||||||||||
DICOM DICOM (Digital Imaging and Communications in Medicine) | |||||||||||
DJVU DjVu File Format | |||||||||||
EMF Enhanced metafile | |||||||||||
J2K JPEG 2000 | |||||||||||
PS PostScript File Format | |||||||||||
PSD Photoshop Document | |||||||||||
SVG Scalar Vector Graphics file | |||||||||||
SVGZ Scalar Vector Graphics file (with gzip compression) | |||||||||||
WEBP WebP Image File Format | |||||||||||
WMF Microsoft Windows Metafile |
Databases are supported via JDBC. To work with the corresponding database format install its database provider.
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
JDBC |
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.