Extracting document metainfo

This demonstration shows and explains usage of the GetDocumentInfo method, that extracts meta info from the document.

Introduction

In some situations it is required to grab meta info from the document before actually editing it. For example, user wants to edit last tab of the multi-tabbed spreadsheet, but he doesn’t know, how many tabs the document contains. Ir it is unclear for the user, is the document password-protected or not. For such situations GroupDocs.Editor provides a GetDocumentInfo method, that returns detailed meta info (metadata) about the specified document.

Using the method

In order to grab the meta info from the document, it should firstly be loaded into the Editor class. Then GetDocumentInfo() should be called. This method obtains one parameter — password as a string. If document is encoded and user knows the password, he can specify it here. For other cases the null or empty string can be passed. Code example below demonstrates the usage:

Editor editor = new Editor("C://input/document.docx");
IDocumentInfo infoDocxWithoutPassword = editorDocx.GetDocumentInfo(null);
IDocumentInfo infoDocxWithPassword = editorDocx.GetDocumentInfo("password"); 

There can be several scenarios here regarding whether document is encoded or not, and did user specified a password:

  1. If password is specified, but document is not password-protected, or the document format doesn’t support encoding at all, the password will be ignored.
  2. If document is password-protected, but password is not specified, the PasswordRequiredException will be thrown while calling GetDocumentInfo().
  3. If document is password-protected,and password is specified, but is incorrect, the IncorrectPasswordException will be thrown while calling GetDocumentInfo().

Explaining resulting type

GetDocumentInfo() method returns a IDocumentInfo. This is interface, that stores meta info about one particular document and contains the next properties:

  1. PageCount. This is a positive number, that returns page count for WordProcessing, PDF and XPS documents, tabs (worksheets) count for Spreadsheets, slides count for Presentations and a number 1 for pageless documents like XML or TXT.
  2. Size. Document size in bytes.
  3. IsEncrypted. A boolean flag that indicates whether document is encrypted with the password or not. If document is of type, that doesn’t support encryption at all, like CSV or XML, this property always returns false.
  4. Format. Returns info about the format itself.

There are eight inheritors of the IDocumentInfo interface, all are structs:

  1. WordProcessingDocumentInfo — common for all WordProcessing family formats.
  2. SpreadsheetDocumentInfo — common for all Spreadsheet family formats.
  3. PresentationDocumentInfo — common for all Presentation family formats.
  4. TextualDocumentInfo — common for all textual types, including all DSV (like CSV and TSV), XML, HTML, and plain text.
  5. FixedLayoutDocumentInfo - common for all documents with a fixed-layout format, this includes only PDF and XPS.
  6. EmailDocumentInfo - common for all Email family formats, like EML, MSG, VCF, PST, MBOX and others.
  7. EbookDocumentInfo - common for all eBook family formats like MOBI and ePub.
  8. MarkdownDocumentInfo - special struct, that is dedicated especially for the Markdown (MD) textual format.

One important thing to note: if GetDocumentInfo() returns NULL value instead of some of IDocumentInfo inheritors, this means that specified document is not supported by the GroupDocs.Editor and thus cannot be opened for editing or saved.

Explaining document format

IDocumentInfo interface contains a Format property of IDocumentFormat type. IDocumentFormat is an interface, that is common for all format descriptors. It is designed for indicating one particular document format and stores format name, extension, MIME-code, and has equality operators.

Each inheritor of IDocumentFormat interface delivers three properties, all of a System.String type: 

  1. Name, that provides name of the format.
  2. Extension, that provides a format extension.
  3. Mime, that provides a MIME-code for a particular format

IDocumentFormat interface has seven inheritors, all of them are structs:

  1. WordProcessingFormats — holds all formats from WordProcessing family.
  2. SpreadsheetFormats — holds all formats from Spreadsheet family.
  3. PresentationFormats — holds all formats from Presentation family.
  4. TextualFormats — holds all formats with text-based nature.
  5. FixedLayoutFormats - holds all formats from the fixed-layout format family. This includes only PDF and XPS.
  6. EBookFormats - holds all eBool (Electronic book) formats like Mobi and ePub.
  7. EmailFormats - holds all email (electronic mail) formats like EML and MSG.