Working with HTML resources

This article describes capabilities of GroupDocs.Editor while working with HTML resources, which are an integral part of HTML document

Almost all existing document formats (except of plain TXT and some others) contain a concept of resources. This usually includes images, specific fonts (which are not installed in operating system) and so on - depending on specific document format. In order to edit documents in browser, the GroupDocs.Editor must convert them to HTML and only then send to the client-side WYSIWYG-editor. In this case GroupDocs.Editor must work with HTML resources, which may be divided onto three groups: images, stylesheets and fonts.

In context of public API of the GroupDocs.Editor the work with all resources is located inside the GroupDocs.Editor.HtmlCss.Resources namespace. There is a public interface — IHtmlResource, which is common for all existing HTML resources. There are also three sub-namespaces — each one for a distinct resource type:

  1. GroupDocs.Editor.HtmlCss.Resources.Images — contains all supportable image formats, including raster (like JPEG, PNG and GIF) and vector (SVG and WMF/EMF). The Images namespace itself contains several auxiliary types and interfaces, and also two sub-namespaces: GroupDocs.Editor.HtmlCss.Resources.Images.Raster and GroupDocs.Editor.HtmlCss.Resources.Images.Vector. Each of these two sub-namespaces contains specific classes for specific image formats.
  2. GroupDocs.Editor.HtmlCss.Resources.Fonts — contains classes, which represent each supportable font formats: WOFF, WOFF2, TTF, OTF, and EOT.
  3. GroupDocs.Editor.HtmlCss.Resources.Textual — contains classes, which represent all supportable textual formats: CSS and XML.

Detecting the exact HTML resource format automatically

Lets imagine a situation, when a customer edited a document in WYSIWYG-editor, obtained edited HTML markup with HTML resources (images, one or multiple stylesheets, and fonts), and want to create an instance of EditableDocument class using a FromMarkup method, which obtains on input a string with HTML markup and a IEnumerable<IHtmlResource>, which in fact usually is a collection of HTML resources. But how should he obtain a set of IHtmlResource instances, taking into account that all resources are represented as a set of files (in better case) or even as a set of byte streams? For such case GroupDocs.Editor contains a special utility class GroupDocs.Editor.HtmlCss.Resources.ResourceTypeDetector. This static class was introduced in the version 21.3 and contains only two static methods:

  1. DetectTypeFromFilename — obtains a filename with extension (or a pure extension) on input and tries to associate it with the most appropriate resource format. If successful, returns a type, which describes this format, in a form of a IResourceType inheritor. If format cannot be properly recognized, returns a NULL value. So, if user has a filename of a resource, and using this method, he can determine a format. But what to do, if there are no filenames with extension or even files, and resources are represented by byte streams? Of filenames are meaningless, have no extension, or their extensions are aforehand incorrect? For such cases the next method is suitable.

  2. TryDetectResource — obtains on input a byte stream (System.IO.Stream), filename and optionally an assumptive format, parses and recognizes input data and returns an inheritor of ‘IHtmlResource’ interface on success or NULL value on failure. This method analyzes a content of input byte stream in any case and returns a valid format even if filename extension and assumptive format are both incorrect; such miscase will only worsen a performance. On another hand, correct filename extension and/or assumptive format do increase a performance. Anyway, input stream should be a completely valid: it must not be a NULL value or a Stream.Null instance, it should be readable and seekable, should not be disposed and its position should be valid.