Extracting document metainfo

This demonstration shows and explains usage of the getDocumentInfo() method, that extracts meta info from the document.

Introduction

In some situations it is required to grab meta info from the document before actually editing it. For example, user wants to edit last tab of the multi-tabbed spreadsheet, but he doesn’t know, how many tabs the document contains. Ir it is unclear for the user, is the document password-protected or not. For such situations GroupDocs.Editor provides a getDocumentInfo() method, that returns detailed meta info (metadata) about the specified document.

Using the method

In order to grab the meta info from the document, it should firstly be loaded into the Editor class. Then getDocumentInfo() should be called. This method obtains one parameter — password as a string. If document is encoded and user knows the password, he can specify it here. For other cases the null or empty string can be passed. Code example below demonstrates the usage:

Editor editor = new Editor("C://input//document.docx");
IDocumentInfo infoDocxWithoutPassword = editor.getDocumentInfo(null);
IDocumentInfo infoDocxWithPassword = editor.getDocumentInfo("password");

There can be several scenarios here regarding whether document is encoded or not, and did user specified a password:

  1. If password is specified, but document is not password-protected, or the document format doesn’t support encoding at all, the password will be ignored.
  2. If document is password-protected, but password is not specified, the PasswordRequiredException will be thrown while calling getDocumentInfo().
  3. If document is password-protected,and password is specified, but is incorrect, the IncorrectPasswordException will be thrown while calling getDocumentInfo().

Explaining resulting type

getDocumentInfo() method returns a IDocumentInfo. This is interface, that stores meta info about one particular document and contains the next properties:

  1. PageCount. This is a positive number, that returns page count for WordProcessing documents, tabs (worksheets) count for Spreadsheets, and 1 for pageless documents like XML or TXT.
  2. Size. Document size in bytes.
  3. IsEncrypted. A boolean flag that indicates whether document is encrypted with the password or not. If document is of type, that doesn’t support encryption at all, like CSV or XML, this property will return ‘false’.
  4. Format. Returns info about the format itself.

There are three inheritors of the IDocumentInfo interface:

  1. WordProcessingDocumentInfo — common for all WordProcessing family formats.
  2. SpreadsheetDocumentInfo — common for all Spreadsheet family formats.
  3. PresentationDocumentInfo — common for all Presentation family formats.
  4. TextualDocumentInfo — common for all textual types, including all DSV (like CSV and TSV), XML, HTML, and plain text.

One important thing to note: if getDocumentInfo() returns NULL value instead of some of IDocumentInfo inheritors, this means that specified document is not supported by the GroupDocs.Editor and thus cannot be opened for editing or saved.

Explaining document format

IDocumentInfo interface contains a Format property of IDocumentFormat type. IDocumentFormat is an interface, that is common for all format descriptors. It is designed for indicating one particular document format and stores format name, extension, and has equality operators. It has three inheritors, all of them are structs:

  1. WordProcessingFormats — holds all formats from WordProcessing family.
  2. SpreadsheetFormats — holds all formats from Spreadsheet family.
  3. PresentationFormats — holds all formats from Presentation family.
  4. TextualFormats — holds all formats with text-based nature.

Each inheritor of IDocumentFormat interface delivers two properties: getName(), that provides name of the format, and getExtension(), that provides a format extension.