This guide demonstrates how to use the getDocumentInfo() method to extract metadata from a document using GroupDocs.Editor for Node.js via Java.
Introduction
In some situations, it is necessary to retrieve metadata from a document before actually editing it. For example, a user might want to edit the last worksheet of a multi-tabbed spreadsheet but doesn’t know how many worksheets the document contains. Or it might be unclear whether the document is password-protected. For such situations, GroupDocs.Editor provides a getDocumentInfo() method that returns detailed metadata about the specified document.
Using the Method
To extract metadata from a document, you first need to load it into the Editor class. Then, call the getDocumentInfo() method. This method accepts one parameter—password as a string. If the document is encrypted and you know the password, you can specify it here. In other cases, you can pass null or an empty string. The code example below demonstrates the usage:
// Import the necessary modules
constfs=require('fs');constgroupdocsEditor=require('@groupdocs/groupdocs.editor');// Load the document into the Editor class
constinputFilePath='C://input//document.docx';constinputStream=fs.createReadStream(inputFilePath);consteditor=newgroupdocsEditor.Editor(inputStream);// Get document info without specifying a password
constinfoDocxWithoutPassword=editor.getDocumentInfo(null);// Get document info with a password
constinfoDocxWithPassword=editor.getDocumentInfo('password');// Close the editor instance
editor.dispose();
Scenarios Regarding Password Protection
There can be several scenarios depending on whether the document is encrypted and whether a password is specified:
Password Specified, Document Not Password-Protected:
If a password is specified but the document is not password-protected, or the document format doesn’t support encryption, the password will be ignored.
Document Password-Protected, Password Not Specified:
If the document is password-protected but no password is specified, a PasswordRequiredException will be thrown when calling getDocumentInfo().
Incorrect Password Specified:
If the document is password-protected and an incorrect password is specified, an IncorrectPasswordException will be thrown when calling getDocumentInfo().
Understanding the Resulting Type
The getDocumentInfo() method returns an instance of IDocumentInfo. This interface stores metadata about the document and contains the following properties:
getPageCount():
Returns a positive number indicating the page count for WordProcessing documents, worksheets count for Spreadsheets, and 1 for pageless documents like XML or TXT.
getSize():
Returns the document size in bytes.
isEncrypted():
A boolean flag indicating whether the document is encrypted with a password. If the document format doesn’t support encryption, this property returns false.
getFormat():
Returns information about the document format.
Inheritors of IDocumentInfo
There are several inheritors of the IDocumentInfo interface, each specific to a document family:
WordProcessingDocumentInfo:
For all WordProcessing formats.
SpreadsheetDocumentInfo:
For all Spreadsheet formats.
PresentationDocumentInfo:
For all Presentation formats.
TextualDocumentInfo:
For textual types, including DSV (like CSV and TSV), XML, HTML, and plain text.
FixedLayoutDocumentInfo:
For documents with a fixed-layout format, such as PDF and XPS.
EmailDocumentInfo:
For all Email formats, like EML, MSG, VCF, PST, MBOX, and others.
EbookDocumentInfo:
For all eBook formats like MOBI and EPUB.
MarkdownDocumentInfo:
Specific to the Markdown (MD) textual format.
Note: If getDocumentInfo() returns null instead of an IDocumentInfo inheritor, it means the specified document is not supported by GroupDocs.Editor and cannot be opened for editing or saving.
Understanding Document Format
The IDocumentInfo interface contains a getFormat() property of type IDocumentFormat. This interface represents a particular document format and stores the format name, extension, and MIME type.
Each inheritor of IDocumentFormat provides three properties, all of type String:
getName():
Provides the name of the format.
getExtension():
Provides the format extension.
getMime():
Provides the MIME type for the particular format.
The IDocumentFormat interface has several inheritors, all of them are structs:
WordProcessingFormats:
Holds all formats from the WordProcessing family.
SpreadsheetFormats:
Holds all formats from the Spreadsheet family.
PresentationFormats:
Holds all formats from the Presentation family.
TextualFormats:
Holds all formats with a text-based nature.
FixedLayoutFormats:
Holds all formats from the fixed-layout format family (e.g., PDF and XPS).
EBookFormats:
Holds all eBook formats like MOBI and EPUB.
EmailFormats:
Holds all email formats like EML and MSG.
Complete Example
Here is a complete example demonstrating how to extract metadata from a WordProcessing document:
// Import the necessary modules
constfs=require('fs');constgroupdocsEditor=require('@groupdocs/groupdocs.editor');try{// Load the document into the Editor class
constinputFilePath='C://input//document.docx';constinputStream=fs.createReadStream(inputFilePath);consteditor=newgroupdocsEditor.Editor(inputStream);// Get document info without specifying a password
constinfoDocxWithoutPassword=editor.getDocumentInfo(null);console.log('Page Count:',infoDocxWithoutPassword.getPageCount());console.log('Size (bytes):',infoDocxWithoutPassword.getSize());console.log('Is Encrypted:',infoDocxWithoutPassword.isEncrypted());constformat=infoDocxWithoutPassword.getFormat();console.log('Format Name:',format.getName());console.log('Format Extension:',format.getExtension());console.log('Format MIME Type:',format.getMime());// Close the editor instance
editor.dispose();}catch(error){if(errorinstanceofgroupdocsEditor.PasswordRequiredException){console.error('The document is password-protected. Please provide a password.');}elseif(errorinstanceofgroupdocsEditor.IncorrectPasswordException){console.error('The provided password is incorrect.');}else{console.error('An error occurred:',error);}}
Conclusion
This guide has demonstrated how to use GroupDocs.Editor for Node.js via Java to extract metadata from documents before editing them. By utilizing the getDocumentInfo() method, you can obtain valuable information such as page count, size, encryption status, and format details. This functionality is essential for applications that require pre-processing or validation of documents before further manipulation.
Note: Replace 'C://input//document.docx' with the actual path to your document.