This article describes the capabilities of GroupDocs.Editor while working with HTML resources, which are an integral part of an HTML document.
What HTML resources are
Almost all existing document formats (except plain TXT and some others) contain a concept of resources. This usually includes images, specific fonts (which are not installed in the operating system) and so on — depending on the specific document format. In order to edit documents in a browser, GroupDocs.Editor must convert them to HTML and only then send them to the client-side WYSIWYG-editor. In this case GroupDocs.Editor must work with HTML resources, which may be divided into several groups: images, stylesheets (CSS), fonts, and audio.
When a document is opened for editing with editor.edit(), the resulting EditableDocument exposes its extracted assets as collections that you can iterate (with for loops) and measure (with len()):
Property
Contents
images
embedded raster (JPEG, PNG, GIF) and vector (SVG, WMF/EMF) images
fonts
extracted fonts (WOFF, WOFF2, TTF, OTF, EOT)
css
stylesheets
audio
audio resources (for example, from presentations)
all_resources
everything above, combined
Iterating over resources
You can inspect, count, and enumerate the resources of an opened document directly from the EditableDocument instance:
The save() method of EditableDocument writes the HTML markup and every extracted resource (images, fonts, css) into a folder, so that the markup references its assets on disk:
withEditor("document.docx")aseditor:editable=editor.edit()# Write HTML plus all resources into a foldereditable.save("page.html","page_resources")
Feeding modified markup with resources back
When a customer edits a document in a WYSIWYG-editor and obtains the edited HTML markup along with its resources, the markup must be wrapped back into an EditableDocument before it can be saved. Depending on how the resources are stored, you can use one of the class methods:
from_markup(html) — for HTML markup held in memory as a string, with resources baked into it.
fromgroupdocs.editorimportEditableDocument# Markup with baked-in resourcesedited=EditableDocument.from_markup(edited_html)# Markup string + on-disk resource folderedited=EditableDocument.from_markup_and_resource_folder(edited_html,"page_resources")# HTML file + resource folderedited=EditableDocument.from_file("page.html","page_resources")
Complete code example
The example below loads a document, opens it for editing, reports how many resources of each kind were extracted, and saves the HTML together with all of its resources into a folder.
importosfromgroupdocs.editorimportEditor,Licensedefworking_with_html_resources():# Optionally set a licenselicense_path=os.path.abspath("./GroupDocs.Editor.lic")ifos.path.exists(license_path):License().set_license(license_path)withEditor("./sample-document.docx")aseditor:editable=editor.edit()# Inspect the extracted resourcesprint("Images:",len(editable.images))print("Stylesheets:",len(editable.css))print("Fonts:",len(editable.fonts))print("All resources:",len(editable.all_resources))# Persist the HTML together with every resource into a foldereditable.save("output.html","output_resources")editable.dispose()if__name__=="__main__":working_with_html_resources()
sample-document.docx is the sample file used in this example. Click here to download it.