When rendering to HTML, GroupDocs.Viewer renders each page of the source document as a separate HTML document.
GroupDocs.Viewer for .NET provides two options to manage CSS, fonts, images, and other resources:
HTML with external resources stores page resources as separate files. This allows reusing common resources and results in reducing page size and loading speed.
HTML with embedded resources integrates page resources into HTML. This makes each document page self-sufficient but results in increasing page size and loading speed.
To render files to HTML, follow these steps:
Create an instance of the Viewer class. Specify the source document path as a constructor parameter.
Instantiate the HtmlViewOptions object. Specify a path to save the rendered pages.
Preserving or disabling JavaScript when rendering to HTML
Lots of the document formats and format families, supported by the GroupDocs.Viewer, may contain different scripts (and/or macros) inside their content. This includes (but not limited to) PDF, most of formats from the WordProcessing family (DOCX, DOCM, RTF, ODT, …), Spreadsheet (XLSX, XLSM, …), Presentation (PPTX, PPTM, …), Email (MSG, EML, EMLX, MBOX, …), and so on. Before the version 25.2, when rendering documents with scripts to the HTML format, the GroupDocs.Viewer tried to preserve all the script and put them to the resultant HTML document without any change or validation. However, in some cases this is an unwanted behaviour, because document can contain malicious or harmful script(s), in most common, the XSS injections, so it is necessary to clean the resultant HTML document from any scripts.
Starting from the version 25.2 the default behaviour was changed — now GroupDocs.Viewer removes all the scripts from the resultant HTML document by default. In some cases, when the JavaScript code is located in the links, the GroupDocs.Viewer replaces it onto the "javascript:void(0)" string, so there will be no page reload when opening the resulted HTML document in the browser. Need to mention that the original document, loaded to the Viewer instance, will be untouched anyway.
Along with the changed default behavior, a new option was added to the HtmlViewOptions class — a public property RemoveJavaScript of the System.Boolean type. By default this property has a true value — JavaScript will be removed from the resultant HTML document. For preserving the JavaScript, as it was in the previous versions of the GroupDocs.Viewer (before 25.2), the false value should be assigned to this property.
Code sample below shows opening a sample XLSX document and rendering it twice:
to the HTML with embedded resources with disabled JavaScript;
to the HTML with external resources with enabled JavaScript;