Extracting document metainfo
This demonstration shows and explains usage of the GetDocumentInfo method, that extracts meta info from the document.
In some situations it is required to grab meta info from the document before actually editing it. For example, user wants to edit last tab of the multi-tabbed spreadsheet, but he doesn’t know, how many tabs the document contains. Ir it is unclear for the user, is the document password-protected or not. For such situations GroupDocs.Editor provides a GetDocumentInfo method, that returns detailed meta info (metadata) about the specified document.
Using the method
In order to grab the meta info from the document, it should firstly be loaded into the
Editor class. Then GetDocumentInfo() should be called. This method obtains one parameter — password as a string. If document is encoded and user knows the password, he can specify it here. For other cases the null or empty string can be passed. Code example below demonstrates the usage:
Editor editor = new Editor("C://input/document.docx"); IDocumentInfo infoDocxWithoutPassword = editorDocx.GetDocumentInfo(null); IDocumentInfo infoDocxWithPassword = editorDocx.GetDocumentInfo("password");
There can be several scenarios here regarding whether document is encoded or not, and did user specified a password:
- If password is specified, but document is not password-protected, or the document format doesn’t support encoding at all, the password will be ignored.
- If document is password-protected, but password is not specified, the PasswordRequiredException will be thrown while calling GetDocumentInfo().
- If document is password-protected,and password is specified, but is incorrect, the IncorrectPasswordException will be thrown while calling GetDocumentInfo().
Explaining resulting type
- PageCount. This is a positive number, that returns page count for WordProcessing, PDF and XPS documents, tabs (worksheets) count for Spreadsheets, slides count for Presentations and a number
1for pageless documents like XML or TXT.
- Size. Document size in bytes.
- IsEncrypted. A boolean flag that indicates whether document is encrypted with the password or not. If document is of type, that doesn’t support encryption at all, like CSV or XML, this property always returns
- Format. Returns info about the format itself.
There are eight inheritors of the IDocumentInfo interface, all are structs:
- WordProcessingDocumentInfo — common for all WordProcessing family formats.
- SpreadsheetDocumentInfo — common for all Spreadsheet family formats.
- PresentationDocumentInfo — common for all Presentation family formats.
- TextualDocumentInfo — common for all textual types, including all DSV (like CSV and TSV), XML, HTML, and plain text.
- FixedLayoutDocumentInfo - common for all documents with a fixed-layout format, this includes only PDF and XPS.
- EmailDocumentInfo - common for all Email family formats, like EML, MSG, VCF, PST, MBOX and others.
- EbookDocumentInfo - common for all eBook family formats like MOBI and ePub.
- MarkdownDocumentInfo - special struct, that is dedicated especially for the Markdown (MD) textual format.
One important thing to note: if GetDocumentInfo() returns
NULL value instead of some of IDocumentInfo inheritors, this means that specified document is not supported by the GroupDocs.Editor and thus cannot be opened for editing or saved.
Explaining document format
IDocumentInfo interface contains a
Format property of IDocumentFormat type. IDocumentFormat is an interface, that is common for all format descriptors. It is designed for indicating one particular document format and stores format name, extension, MIME-code, and has equality operators.
- Name, that provides name of the format.
- Extension, that provides a format extension.
- Mime, that provides a MIME-code for a particular format
IDocumentFormat interface has seven inheritors, all of them are structs:
- WordProcessingFormats — holds all formats from WordProcessing family.
- SpreadsheetFormats — holds all formats from Spreadsheet family.
- PresentationFormats — holds all formats from Presentation family.
- TextualFormats — holds all formats with text-based nature.
- FixedLayoutFormats - holds all formats from the fixed-layout format family. This includes only PDF and XPS.
- EBookFormats - holds all eBool (Electronic book) formats like Mobi and ePub.
- EmailFormats - holds all email (electronic mail) formats like EML and MSG.