Supported Document Formats

The following tables indicate the file formats from which GroupDocs.Parser for .NET can extract data.

Word Processing

Document TypeParse Document by TemplateExtract Text (Accurate)Extract Text (Raw)Extract Structured Text and Formatted TextExtract Text AreasExtract MetadataExtract ImagesExtract Containers and AttachmentsParse Form DataExtract Table of Contents
DOC

Microsoft Word Document
(tick)(tick) (tick)(tick)(tick)(tick)  (tick)
DOT

Microsoft Word Document Template
(tick)(tick) (tick)(tick)(tick)(tick)  (tick)
DOCX

Office Open XML Document
(tick)(tick) (tick)(tick)(tick)(tick)  (tick)
DOCM

Office Open XML Macro-Enabled Document
(tick)(tick) (tick)(tick)(tick)(tick)  (tick)
DOTX

Office Open XML Document Template
(tick)(tick) (tick)(tick)(tick)(tick)  (tick)
DOTM

Office Open XML Document Macro-Enabled Template
(tick)(tick) (tick)(tick)(tick)(tick)  (tick)
TXT

Plain text
 (tick)        
ODT

Open Document Text
(tick)(tick) (tick)(tick)(tick)(tick)  (tick)
OTT

Open Document Text Template
(tick)(tick) (tick)(tick)(tick)(tick)  (tick)
RTF

Rich Text Format
(tick)(tick) (tick)(tick)(tick)(tick)  (tick)

PDF

Document TypeParse Document by TemplateExtract Text (Accurate)Extract Text (Raw)Extract Structured Text and Formatted TextExtract Text AreasExtract MetadataExtract ImagesExtract Containers and AttachmentsParse Form DataExtract Table of Contents
PDF

Portable Document Format File
(tick)(tick)(tick) (tick)(tick)(tick)(tick)(tick)(tick)

Markup

Document TypeParse Document by TemplateExtract Text (Accurate)Extract Text (Raw)Extract Structured Text and Formatted TextExtract Text AreasExtract MetadataExtract ImagesExtract Containers and AttachmentsParse Form DataExtract Table of Contents
XHTML

Extensible Hypertext Markup Language File
 (tick)        
MHTML

MIME HTML File
 (tick)        
MD

Markdown
 (tick) (tick)

(Formatted Text is Not supported)
      
XML

XML File
 (tick)        

Ebook

Document TypeParse Document by TemplateExtract Text (Accurate)Extract Text (Raw)Extract Structured Text and Formatted TextExtract Text AreasExtract MetadataExtract ImagesExtract Containers and AttachmentsParse Form DataExtract Table of Contents
CHM

Compiled HTML Help File
 (tick) (tick)     (tick)
EPUB

Digital E-Book File Format
 (tick) (tick) (tick)   (tick)
FB2

FictionBook 2.0 File
 (tick) (tick) (tick)    

Speadsheet

Document TypeParse Document by TemplateExtract Text (Accurate)Extract Text (Raw)Extract Structured Text and Formatted TextExtract Text AreasExtract MetadataExtract ImagesExtract Containers and AttachmentsParse Form DataExtract Table of Contents
XLS

Microsoft Excel Spreadsheet
(tick)(tick)(tick)(tick)(tick)(tick)(tick)   
XLT

Microsoft Excel Template
(tick)(tick)(tick)(tick)(tick)(tick)(tick)   
XLSX

Office Open XML Spreadsheet
(tick)(tick)(tick)(tick)(tick)(tick)(tick)   
XLSM

Office Open XML Macro-Enabled Spreadsheet
(tick)(tick)(tick)(tick)(tick)(tick)(tick)   
XLSB

Office Open XML Binary Spreadsheet
(tick)(tick)  (tick)(tick)(tick)   
XLTX

Office Open XML Spreadsheet Template
(tick)(tick)(tick)(tick)(tick)(tick)(tick)   
XLTM

Office Open XML Macro-Enabled Spreadsheet Template
(tick)(tick)(tick)(tick)(tick)(tick)(tick)   
ODS

Open Document Spreadsheet
(tick)(tick)  (tick)(tick)(tick)   
OTS

Open Document Spreadsheet Template
(tick)(tick)  (tick)(tick)(tick)   
CSV

Comma Separated Values
 (tick)        
XLA

Excel Add-In File
(tick)(tick)(tick)(tick)(tick)(tick)(tick)   
XLAM

Excel Open XML Macro-Enabled Add-In
(tick)(tick)(tick)(tick)(tick)(tick)(tick)   
NUMBERS

Apple iWork Numbers
(tick)(tick)  (tick) (tick)   

Presentation

Document TypeParse Document by TemplateExtract Text (Accurate)Extract Text (Raw)Extract Structured Text and Formatted TextExtract Text AreasExtract MetadataExtract ImagesExtract Containers and AttachmentsParse Form DataExtract Table of Contents
PPT

PowerPoint Presentation
(tick)(tick)(tick)(tick)(tick)(tick)(tick)   
PPS

PowerPoint Slideshow
(tick)(tick)(tick)(tick)(tick)(tick)(tick)   
POT

PowerPoint Template
(tick)(tick)(tick)(tick)(tick)(tick)(tick)   
PPTX

Office Open XML Presentation
(tick)(tick)(tick)(tick)(tick)(tick)(tick)   
PPTM

Office Open XML Macro-Enabled Presentation
(tick)(tick)(tick)(tick)(tick)(tick)(tick)   
POTX

Office Open XML Presentation Template
(tick)(tick)(tick)(tick)(tick)(tick)(tick)   
POTM

Office Open XML Macro-Enabled Presentation Template
(tick)(tick)(tick)(tick)(tick)(tick)(tick)   
PPSX

Office Open XML Presentation Slideshow
(tick)(tick)(tick)(tick)(tick)(tick)(tick)   
PPSM

Office Open XML Macro-Enabled Presentation Slideshow
(tick)(tick)(tick)(tick)(tick)(tick)(tick)   
ODP

Open Document Presentation
(tick)(tick) (tick)(tick)(tick)(tick)   
OTP

Open Document Presentation Template
(tick)(tick) (tick)(tick)(tick)(tick)   

Email

Document TypeParse Document by TemplateExtract Text (Accurate)Extract Text (Raw)Extract Structured Text and Formatted TextExtract Text AreasExtract MetadataExtract ImagesExtract Containers and AttachmentsParse Form DataExtract Table of Contents
PST

Outlook Personal Information Store File
       (tick)  
OST

Outlook Offline Data File
       (tick)  
EML

E-Mail Message
 (tick) (tick) (tick)(tick)(tick)  
EMLX

Apple Mail Message
 (tick) (tick) (tick)(tick)(tick)  
MSG

Outlook Mail Message
 (tick) (tick) (tick)(tick)(tick)  

Note

Document TypeParse Document by TemplateExtract Text (Accurate)Extract Text (Raw)Extract Structured Text and Formatted TextExtract Text AreasExtract MetadataExtract ImagesExtract Containers and AttachmentsParse Form DataExtract Table of Contents
ONE

OneNote Document
 (tick)        

Archive

Document TypeParse Document by TemplateExtract Text (Accurate)Extract Text (Raw)Extract Structured Text and Formatted TextExtract Text AreasExtract MetadataExtract ImagesExtract Containers and AttachmentsParse Form DataExtract Table of Contents
ZIP

Zipped File
      (tick)(tick)  

Database

Databases are supported via ADO.NET. To work with the corresponding database format install its database provider.

Document TypeParse Document by TemplateExtract Text (Accurate)Extract Text (Raw)Extract Structured Text and Formatted TextExtract Text AreasExtract MetadataExtract ImagesExtract Containers and AttachmentsParse Form DataExtract Table of Contents
ADO.NET(tick)       (tick)