In an XmlDocument, either when writing and modify later, is it possible to remove the self-closing tags (ie />
) for a certain element.
For example: change
<img />
or <img></img>
to <img>
. <br />
to <br>
. Why you ask? I'm trying to conform to the HTML for Word 2007 schema; the resulting HTML will be displayed in Microsoft Outlook 2007 or later.
After reading another StackOverflow question , I tried the setting the IsEmpty
property to false
like so.
var imgElements = finalHtmlDoc.SelectNodes("//*[local-name()=\"img\"]").OfType<XmlElement>();
foreach (var element in imgElements)
{
element.IsEmpty = false;
}
However that resulted in <img />
becoming <img></img>
. Also, as a hack I also tried changing the OuterXml
property directly however that doesn't work (didn't expect it to).
Question
Can you remove the self-closing tags from XmlDocument
? I honestly do not think there is, as it would then be invalid xml (no closing tag), however thought I would throw the question out the community.
Update:
I ended up fixing the HTML string after exporting from the XmlDocument
using a regular expression (written in the wonderful RegexBuddy ).
var fixHtmlRegex = new Regex("<(?<tag>meta|img|br)(?<attributes>.*?)/>", RegexOptions.IgnoreCase | RegexOptions.Multiline);
return fixHtmlRegex.Replace(htmlStringBuilder.ToString(), "<$1$2>");
It cleared many errors from the validation pass and allow me to focus on the real compatibility problems.
You're right: it's not possible simply because it's invalid (or rather, not well-formed ) XML. Empty elements in XML must be closed, be it with the shortcut syntax />
or with an immediate closing tag.
Both HTML and XML are applications of SGML. While HTML and SGML allow unclosed tags like <br>
, XML does not.
<img>
将不是有效的XML,所以不能,您不能这样做。
A bit embarrassed by my answer, but it worked for what I needed. After you have a complete xml document you can string manipulate it to clean it up...
private string RemoveSelfClosingTags(string xml)
{
char[] seperators = { ' ', '\t', '\r', '\n' };
int prevIndex = -1;
while (xml.Contains("/>"))
{
int selfCloseIndex = xml.IndexOf("/>");
if (prevIndex == selfCloseIndex)
return xml; // we are in a loop...
prevIndex = selfCloseIndex;
int tagStartIndex = -1;
string tag = "";
//really? no backwards indexof?
for (int i = selfCloseIndex; i > 0; i--)
{
if (xml[i] == '<')
{
tagStartIndex = i;
break;
}
}
int tagEndIndex = xml.IndexOfAny(seperators, tagStartIndex);
int tagLength = tagEndIndex - tagStartIndex;
tag = xml.Substring(tagStartIndex + 1, tagLength - 1);
xml = xml.Substring(0, selfCloseIndex) + "></" + tag + ">" + xml.Substring(selfCloseIndex + 2);
}
return xml;
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.