This example demonstrates the opening, editing and saving the XML documents, using different options and adjustments.
Introduction
GroupDocs.Editor has supported importing the documents in XML (eXtensible Markup Language) format for a long time. However, in version 23.2 the XML processing mechanism was completely redeveloped and drastically enhanced, XML editing public options were also redesigned and significantly expanded, with new sub-option classes and new public types.
Current article describes this new XML processing mechanism and is applicable to the GroupDocs.Editor version 23.2 and above.
Loading XML documents
Loading of the XML documents to the GroupDocs.Editor.Editor class is usual and the same as for other formats. There are no dedicated load options for the XML format, it is enough to specify the file itself through file path or byte stream.
If loading through file path, the file extension does not matter, so you may freely load an XML file not only with the *.xml, but with any other extension like *.csproj, *.svg or any other extension - only valid internal structure matters.
Also please note that you cannot treat HTML files like XML — only XHTML can be treated like valid XML.
Code example below shows loading of the same document by two approaches into the two different Editor instances:
conststringxmlFilename="Sample.xml";stringxmlInputPath=System.IO.Path.Combine("full_folder_path",xmlFilename);using(FileStreamxmlStream=System.IO.File.OpenRead(xmlInputPath))using(GroupDocs.Editor.EditoreditorFromPath=newEditor(xmlInputPath))//from the pathusing(GroupDocs.Editor.EditoreditorFromStream=newEditor(xmlStream))//from the stream{//Here two Editor instances can separately work with one file}
Editing XML documents
Like for other format families in GroupDocs.Editor, there is a special XmlEditOptions class for editing the XML documents. As always, it is not mandatory when editing a document, so the Editor.Edit() overload without parameter may be used — GroupDocs.Editor will automatically detect the format and apply the default options. The example below shows such a case: XML document is loaded, edited, and then the edited content, represented with a EditableDocument class, may be passed to the WYSIWYG-editor or any other HTML editing software, or simply saved to the disk, as it is shown in the example.
conststringxmlFilename="Sample.xml";stringxmlInputPath=System.IO.Path.Combine("full_folder_path",xmlFilename);stringoutputPath=System.IO.Path.Combine("output_folder_path",string.Format("{0}.html",Path.GetFileNameWithoutExtension(xmlFilename)));using(GroupDocs.Editor.Editoreditor=newEditor(xmlInputPath)){using(EditableDocumentedited=editor.Edit()){//Send to WYSIWYG-editor or somewhere elseedited.Save(outputPath);}}
XmlEditOptions class has different properties, some of them are grouped into “wrappers” — special classes XmlHighlightOptions and XmlFormatOptions, which are described below in detail. The most useful and important properties, however, are directly inside the XmlEditOptions class.
This property allows to set the encoding, which will be applied while opening an input XML file (keep in mind that any XML is first of all a text file). By default all XML files are UTF8, so the default value of this option is also UTF8.
GroupDocs.Editor can handle without error or exception any XML document: corrupted, truncated, with invalid structure, it can even treat HTML like XML. However, the representation of such a document may degrade in such cases — for example, it is impossible to properly display the nested hierarchical structure, if it is broken.
In order to fix this, the FixIncorrectStructure boolean flag is introduced. If enabled, the GroupDocs.Editor scans the XML document and tries to fix its structure. In particular, it escapes some prohibited characters, properly closes unclosed tags, opens unopened tags, fixes overlapping tags, and so on.
Because such document scanning and fixing requires additional computational resources and in general most of XML documents are valid, this mechanism is disabled by default: FixIncorrectStructure has a false value. So for enabling it, you must set the true value manually.
This property enables the mechanism of recognizing and preparing the URIs (web address). By default this mechanism is disabled (false) and URIs, if they are present in the text nodes or attribute values inside the XML documents, are represented as an ordinary text. When this property is enabled (true), the GroupDocs.Editor scans the XML document for any valid URIs, and if found, represents them as external links in the resultant HTML format: by using the A element. GroupDocs.Editor is searching for URIs in: text nodes, CDATA sections, XML comments, attribute values, DocType definitions.
This property is very similar to the RecognizeUris: it does the same, but at this time for email addresses. By default it is disabled, so if the email address is present in the input XML, the output HTML will contain it as ordinary text. However, if enabled by setting true, all valid email addresses will be represented with mailto scheme and A element.
This property enables the truncation of trailing whitespaces in the text nodes — the textual content, located between start and end tags (inner-tag text). By default is disabled (false) — trailing whitespaces will be preserved. Line breaks are also treated as whitespaces. May be useful when input XML is formed with line breaks and empty spans for the sake of readability.
QuoteType is a new type, introduced in the version 23.2 — it is a struct, that represents two types of quotes, permissible in the XML format in attribute values: a single quote, represented by the U+0027 APOSTROPHE character, and a double quote, represented by the U+0022 QUOTATION MARK character.
With this option users can redefine the quote type, used in the original XML document, and set the desired quote, which should be present in the resultant HTML. By default the double quotes are used.
The HighlightOptions property has a type of XmlHighlightOptions. An already created instance of XmlHighlightOptions is already set in the HighlightOptions property and a reference to it cannot be modified; only members of XmlHighlightOptions are allowed to modify.
XmlHighlightOptions has 6 sub-properties:
XmlTagsFontSettings is responsible for representing the font of XML tags, this include both start and end tags, angle brackets with tag names. By default it is a “Calibri” font, 12pt size.
AttributeNamesFontSettings is responsible for representing the font of attribute names. By default is a “Calibri” font, 11pt size, red color.
AttributeValuesFontSettings is responsible for representing the font of attribute values. By default is a “Calibri” font, 11pt size, blue color.
InnerTextFontSettings is responsible for representing the font of text nodes (text inside and between XML elements). By default is a “Times New Roman” font, 11pt size, black color.
HtmlCommentsFontSettings is responsible for representing the font of HTML comments (XML comments with a syntax of HTML comments), including a pair of opening <!-- and closing -–> tags. By default is a “Consolas” font, 11pt size, green color.
CDataFontSettings is responsible for representing the font of CDATA sections, including a pair of opening <![CDATA[ and closing ]]> tags. By default is a “Calibri” font, 11pt size, coral color.
All these properties are of a WebFont class — a new public type, introduced in the version 23.2. It has 6 public properties, one of them is of System.String type, while 5 others are structs, each of which holds and represents some aspect of the font. WebFont class may be treated as a container for the font properties, where by default every property has some default value, and users can change some or every of them in defined limits.
Font name — Name property of a System.String type. Is translated into the font-family CSS declaration.
Font size — Size property of FontSize type. Is translated into the font-size CSS declaration.
Font color — Color property of ArgbColor type. Is translated into the color CSS declaration.
Font weight (boldness) — Weight property of FontWeight type. Is translated into the font-weight CSS declaration.
Font style — Style property of FontStyle type. Is translated into the font-style CSS declaration.
Text decoration line — Line property of TextDecorationLineType type. Is translated into the text-decoration-line CSS declaration.
WebFont has also some other public functions — the ability to check on equality with itself (System.IEquatable) and support of deep cloning (System.ICloneable).
IsDefaultboolean property, that indicates whether the current instance has a default value, which means that all properties are in their initial state.
ResetToDefault() method, that resets all properties to their initial values
Code sample below shows a creation of the XmlEditOptions instance and setting different (but not all) properties within the XmlHighlightOptions sub-property.
Options.XmlEditOptionseditOptions=newXmlEditOptions();Assert.IsTrue(editOptions.HighlightOptions.IsDefault);Options.XmlHighlightOptionshighlightOptions=editOptions.HighlightOptions;//Setting XML tags font settingshighlightOptions.XmlTagsFontSettings.Size=HtmlCss.Css.Properties.FontSize.Large;highlightOptions.XmlTagsFontSettings.Color=HtmlCss.Css.DataTypes.ArgbColor.KnownColors.CssLevel1.Olive;//Setting attribute names font settingshighlightOptions.AttributeNamesFontSettings.Name="Arial";highlightOptions.AttributeNamesFontSettings.Line=HtmlCss.Css.Properties.TextDecorationLineType.Underline;highlightOptions.AttributeNamesFontSettings.Weight=HtmlCss.Css.Properties.FontWeight.Lighter;//Setting attribute values font settingshighlightOptions.AttributeValuesFontSettings.Line=HtmlCss.Css.Properties.TextDecorationLineType.Underline+HtmlCss.Css.Properties.TextDecorationLineType.Overline;highlightOptions.AttributeValuesFontSettings.Style=HtmlCss.Css.Properties.FontStyle.Italic;//Setting CDATA sections font settingshighlightOptions.CDataFontSettings.Line=HtmlCss.Css.Properties.TextDecorationLineType.LineThrough;highlightOptions.CDataFontSettings.Size=HtmlCss.Css.Properties.FontSize.Smaller;//Setting HTML comments font settingshighlightOptions.HtmlCommentsFontSettings.Color=HtmlCss.Css.DataTypes.ArgbColor.KnownColors.CssLevel3.Lightgreen;highlightOptions.HtmlCommentsFontSettings.Name="Courier New";//Setting text node font settingshighlightOptions.InnerTextFontSettings.Weight=HtmlCss.Css.Properties.FontWeight.FromNumber(300);highlightOptions.InnerTextFontSettings.Size=HtmlCss.Css.Properties.FontSize.XSmall;//Checking they are not defaultAssert.IsFalse(editOptions.HighlightOptions.IsDefault);//Resetting to defaulthighlightOptions.ResetToDefault();//Checking they are default again nowAssert.IsTrue(editOptions.HighlightOptions.IsDefault);
FormatOptions
The FormatOptions property has a type of XmlFormatOptions. Like with HighlightOptions property, the FormatOptions property has already set an instance of the XmlFormatOptions class with all sub-properties having their default values. Similarly, a reference to this XmlFormatOptions instance cannot be modified, but only obtained through the getter, and no new instance of XmlFormatOptions can be created.
boolEachAttributeFromNewline. XML documents usually have the XML elements with a set of attribute-value pairs. By default these pairs are represented in the resultant HTML in a single line next to the element name (false value). However, when setting this property to true, each and every pair of attribute-value in every XML element will be placed on a new separate line in the resultant HTML.
boolLeafTextNodesOnNewline. XML documents usually have the text nodes — pieces of textual content, located inside and/or between adjacent XML elements. By default these text nodes are represented in the resultant HTML along with other XML nodes, in the same line with them (false value). However, when setting this property to true, each text node will be placed on a new line with a bigger left indent.
HtmlCss.Css.DataTypes.LengthLeftIndent. XML is a hierarchical structure, and most of WYSIWYG-editors display hierarchical structures using left indentation: the deeper node is located inside the tree (bigger nesting level) — the bigger left indent is. GroupDocs.Editor does the same, and the LeftIndent property regulates how big is a left indent distance for a one level. By default it is 10 points (pt). So the root element will have no indent at all, 1st nesting level will be 10pt shifted, 2nd level — 20pt, 3rd level — 30pt, and so on. This length is represented by a public type HtmlCss.Css.DataTypes.Length and can be easily changed to desired. What is more, it can be specified not necessarily in points, but in any CSS-compatible length unit, like percent, pixel, centimeter, and so on. Left indent can be disabled at all by setting a unitless zero to this property like this: Length.UnitlessZero. But keep in mind that non-zero unitless values are prohibited according to the CSS specification, and GroupDocs.Editor prohibits such values too.
Like in XmlHighlightOptions, the XmlFormatOptions has a useful boolean property IsDefault, that indicates whether the current instance has all properties to be set to their initial values.
Code sample below shows a creation of the XmlEditOptions instance and setting different properties within the XmlFormatOptions sub-property.
Options.XmlEditOptionseditOptions=newXmlEditOptions();//Checking that options are default for nowAssert.IsTrue(editOptions.FormatOptions.IsDefault);Options.XmlFormatOptionsformatOptions=editOptions.FormatOptions;//Each attribute-value pair must be placed on a new lineformatOptions.EachAttributeFromNewline=true;//Text nodes (textual content between and inside XML elements) must be placed on a new lineformatOptions.LeafTextNodesOnNewline=true;//Setting a custom text indent using 'Length' data type, which is composed from value with unitformatOptions.LeftIndent=HtmlCss.Css.DataTypes.Length.FromValueWithUnit(20,HtmlCss.Css.DataTypes.Length.Unit.Px);//Checking that options are not default nowAssert.IsFalse(editOptions.FormatOptions.IsDefault);//Disabling a left indent at allformatOptions.LeftIndent=HtmlCss.Css.DataTypes.Length.UnitlessZero;
Complex example
Now, when the XmlEditOptions class with all its properties and sub-properties is described in detail, it’s time to bring it together. Code example below shows opening a XML file from file path, creating and adjusting two different XmlEditOptions instances, and editing a document twice with these two different option classes to obtain two different EditableDocument representations. Then these EditableDocument are saved to the two different HTML files.
Article “Extracting document metainfo” describes the GetDocumentInfo() method, that allows to detect the document format and extract its metadata without editing it. XML format is supported as well.
When the GetDocumentInfo() method is called for the Editor class instance, which was previously created with an XML document loaded into the constructor, this method returns a GroupDocs.Editor.Metadata.TextualDocumentInfo instance — it is a common class for all document formats of a textual nature, like HTML, XML, TXT.