This demonstration shows how to open input document, convert it to intermediate EditableDocument, and get HTML markup in different forms depending on client requirements.
Preparations
When input document is loaded into Editor class and opened for edit by transforming to the intermediate EditableDocument class, it is possible to generate and get HTML markup in different forms. Code below shows all variations of such procedure.
First of all user needs to load document into Editor class and open it for editing, what is demonstrated in the code below.
stringinputFilePath="C:\\input_path\\document.docx";//path to some documentWordProcessingLoadOptionsloadOptions=newWordProcessingLoadOptions();Editoreditor=newEditor(inputFilePath,loadOptions);//passing path and load options to the constructorEditableDocumentdocument=editor.Edit(newWordProcessingEditOptions());//opening document for editing with format-specific edit options
Piece of code above has prepared a ready-to-use instance of EditableDocument class, that contains the original document in its own intermediate format and is able to generate HTML markup in different forms.
Getting whole HTML content
The most default and standard method for generating HTML markup is parameterless GetContent method:
stringhtmlContent=document.GetContent();
If document has external resources (stylesheets, fonts, images), they are referenced via different HTML elements: stylesheets are specified through LINK elements, while images — through IMG. When using the GetContent() method, such external resources will be referenced by external links. For example:
Quite often on the web-server, where such HTML will be edited, resources are processed by specific HTTP handler. In such cases it is required to adjust paths to such endpoints. More advanced overload of the GetContent() method can help:
In the example above specified prefixes will be added to every external link in the document’s markup. For example, with the code above link will be the next:
Starting from the GroupDocs.Editor for .NET version 23.9 it is possible not only to specify the prefix, but also a template format string, where one or more placeholders mark the places, where resource names will be recorded. If specified string contains valid placeholder(s), then a GroupDocs.Editor will replace the placeholder with a resource name. Otherwise, if placeholder(s) are not found, GroupDocs.Editor will treat it as a prefix string. For example, the next code sample shows specifying the template string for the external images and stylesheets:
In the example above the resource names will be placed inside the placeholders in the specified template strings, so, if the original document contains a stylesheet and two images named “foo.jpeg” and “bar.png”, the output HTML markup will be the next:
Lot of HTML WYSIWYG editors are not able to process the whole HTML document, with HEAD section and so on. They are able only to process inner content of HTML->BODY element. In order to obtain such part of HTML markup, EditableDocument class contains the GetBodyContent() method, which, as previous one, has two overloads, that are provided below:
First parameterless overload, like previous one, leaves links to the external images intact. Second, that obtains external resource prefix, adds this prefix to every url in the ‘src’ attribute of every IMG tag, that is found inside HTML->BODY markup.
As in previous sample, starting from the version 23.9 it may be not only a prefix, but also a template string.
Getting base64-encoded content
Sometimes it is necessary to obtain all content of all document with all used resources into one single string.GroupDocs.Editor allows to do this:
In such string all stylesheets will be placed into the STYLE elements in the HTML->HEAD section, all images in IMG elements will be serialized with base64 encoding and placed directly in the ‘src’ attributes. All fonts and images, which are used in stylesheets, will also be serialized and stored in appropriate locations in the corresponding stylesheet. Such string will be fully autonomous and self-sufficient.
Conclusion
This guide has explained different ways of obtaining HTML markup from a document in different forms.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.