[英]Remove self-closing tags (e.g. />) in an XmlDocument
In an XmlDocument, either when writing and modify later, is it possible to remove the self-closing tags (ie />
) for a certain element. 在XmlDocument中,在以后编写和修改时,可以删除某个元素的自动关闭标签(即
/>
)。
For example: change 例如:更改
<img />
or <img></img>
to <img>
. <img />
或<img></img>
到<img>
。 <br />
to <br>
. <br />
到<br>
。 Why you ask? 你为什么问? I'm trying to conform to the HTML for Word 2007 schema;
我正在尝试符合HTML for Word 2007架构; the resulting HTML will be displayed in Microsoft Outlook 2007 or later.
生成的HTML将显示在Microsoft Outlook 2007或更高版本中。
After reading another StackOverflow question , I tried the setting the IsEmpty
property to false
like so. 阅读了另一个StackOverflow问题之后 ,我像这样尝试将
IsEmpty
属性设置为false
。
var imgElements = finalHtmlDoc.SelectNodes("//*[local-name()=\"img\"]").OfType<XmlElement>();
foreach (var element in imgElements)
{
element.IsEmpty = false;
}
However that resulted in <img />
becoming <img></img>
. 但是,这导致
<img />
成为<img></img>
。 Also, as a hack I also tried changing the OuterXml
property directly however that doesn't work (didn't expect it to). 另外,作为一种黑客,我还尝试过直接更改
OuterXml
属性,但这不起作用(没想到会如此)。
Question 题
Can you remove the self-closing tags from XmlDocument
? 您可以从
XmlDocument
删除自动关闭标签吗? I honestly do not think there is, as it would then be invalid xml (no closing tag), however thought I would throw the question out the community. 老实说,我不认为有,因为那样会是无效的xml(没有结束标记),但是我认为我会把这个问题抛给社区。
Update: 更新:
I ended up fixing the HTML string after exporting from the XmlDocument
using a regular expression (written in the wonderful RegexBuddy ). 在使用正则表达式(用奇妙的RegexBuddy编写 )从
XmlDocument
导出后,我最终修复了HTML字符串。
var fixHtmlRegex = new Regex("<(?<tag>meta|img|br)(?<attributes>.*?)/>", RegexOptions.IgnoreCase | RegexOptions.Multiline);
return fixHtmlRegex.Replace(htmlStringBuilder.ToString(), "<$1$2>");
It cleared many errors from the validation pass and allow me to focus on the real compatibility problems. 它清除了验证过程中的许多错误,使我可以专注于真正的兼容性问题。
You're right: it's not possible simply because it's invalid (or rather, not well-formed ) XML. 您说对了:不可能仅仅因为它是无效的XML(或者不是格式正确的 XML)就可以了。 Empty elements in XML must be closed, be it with the shortcut syntax
/>
or with an immediate closing tag. XML中的空元素必须使用快捷方式语法
/>
或使用立即关闭标记关闭。
Both HTML and XML are applications of SGML. HTML和XML都是SGML的应用程序。 While HTML and SGML allow unclosed tags like
<br>
, XML does not. HTML和SGML允许使用
<br>
类的未关闭标签,而XML则不允许。
<img>
将不是有效的XML,所以不能,您不能这样做。
A bit embarrassed by my answer, but it worked for what I needed. 我的回答有些尴尬,但是它可以满足我的需求。 After you have a complete xml document you can string manipulate it to clean it up...
拥有完整的xml文档后,您可以对其进行字符串操作以对其进行清理...
private string RemoveSelfClosingTags(string xml)
{
char[] seperators = { ' ', '\t', '\r', '\n' };
int prevIndex = -1;
while (xml.Contains("/>"))
{
int selfCloseIndex = xml.IndexOf("/>");
if (prevIndex == selfCloseIndex)
return xml; // we are in a loop...
prevIndex = selfCloseIndex;
int tagStartIndex = -1;
string tag = "";
//really? no backwards indexof?
for (int i = selfCloseIndex; i > 0; i--)
{
if (xml[i] == '<')
{
tagStartIndex = i;
break;
}
}
int tagEndIndex = xml.IndexOfAny(seperators, tagStartIndex);
int tagLength = tagEndIndex - tagStartIndex;
tag = xml.Substring(tagStartIndex + 1, tagLength - 1);
xml = xml.Substring(0, selfCloseIndex) + "></" + tag + ">" + xml.Substring(selfCloseIndex + 2);
}
return xml;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.