简体   繁体   中英

Remove self-closing tags (e.g. />) in an XmlDocument

In an XmlDocument, either when writing and modify later, is it possible to remove the self-closing tags (ie /> ) for a certain element.

For example: change

  • <img /> or <img></img> to <img> .
  • <br /> to <br> .

Why you ask? I'm trying to conform to the HTML for Word 2007 schema; the resulting HTML will be displayed in Microsoft Outlook 2007 or later.

After reading another StackOverflow question , I tried the setting the IsEmpty property to false like so.

var imgElements = finalHtmlDoc.SelectNodes("//*[local-name()=\"img\"]").OfType<XmlElement>();
foreach (var element in imgElements)
{
    element.IsEmpty = false;
}

However that resulted in <img /> becoming <img></img> . Also, as a hack I also tried changing the OuterXml property directly however that doesn't work (didn't expect it to).

Question

Can you remove the self-closing tags from XmlDocument ? I honestly do not think there is, as it would then be invalid xml (no closing tag), however thought I would throw the question out the community.

Update:

I ended up fixing the HTML string after exporting from the XmlDocument using a regular expression (written in the wonderful RegexBuddy ).

    var fixHtmlRegex = new Regex("<(?<tag>meta|img|br)(?<attributes>.*?)/>", RegexOptions.IgnoreCase | RegexOptions.Multiline);
    return fixHtmlRegex.Replace(htmlStringBuilder.ToString(), "<$1$2>");

It cleared many errors from the validation pass and allow me to focus on the real compatibility problems.

You're right: it's not possible simply because it's invalid (or rather, not well-formed ) XML. Empty elements in XML must be closed, be it with the shortcut syntax /> or with an immediate closing tag.

Both HTML and XML are applications of SGML. While HTML and SGML allow unclosed tags like <br> , XML does not.

<img>将不是有效的XML,所以不能,您不能这样做。

A bit embarrassed by my answer, but it worked for what I needed. After you have a complete xml document you can string manipulate it to clean it up...

private string RemoveSelfClosingTags(string xml)
    {
        char[] seperators = { ' ', '\t', '\r', '\n' };

        int prevIndex = -1;
        while (xml.Contains("/>"))
        {
            int selfCloseIndex = xml.IndexOf("/>");
            if (prevIndex == selfCloseIndex)
                return xml; // we are in a loop...

            prevIndex = selfCloseIndex;

            int tagStartIndex = -1;

            string tag = "";

            //really? no backwards indexof?
            for (int i = selfCloseIndex; i > 0; i--)
            {
                if (xml[i] == '<')
                {
                    tagStartIndex = i;
                    break;
                }
            }


            int tagEndIndex = xml.IndexOfAny(seperators, tagStartIndex);
            int tagLength = tagEndIndex - tagStartIndex;
            tag = xml.Substring(tagStartIndex + 1, tagLength - 1);


            xml = xml.Substring(0, selfCloseIndex) + "></" + tag + ">" + xml.Substring(selfCloseIndex + 2);
        }

        return xml;

    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM