简体   繁体   English

XML:如何删除没有属性和子元素的所有节点

[英]XML : how to remove all nodes which have no attributes nor child elements

I have a xml document like this : 我有一个像这样的xml文档:

<Node1 attrib1="abc">
    <node1_1>
         <node1_1_1 attrib2 = "xyz" />
    </ node1_1>
</Node1>

<Node2 />    

Here <node2 /> is the node i want to remove since it has not children/elements nor any attributes. 这里<node2 />是我想删除的节点,因为它没有子元素和元素,也没有任何属性。

Using an XPath expression it is possible to find all nodes that have no attributes or children. 使用XPath表达式可以找到没有属性或子节点的所有节点。 These can then be removed from the xml. 然后可以从xml中删除它们。 As Sani points out, you might have to do this recursively because node_1_1 becomes empty if you remove its inner node. 正如Sani指出的那样,您可能必须递归执行此操作,因为如果删除其内部节点,node_1_1将变为空。

var xmlDocument = new XmlDocument();
xmlDocument.LoadXml(
@"<Node1 attrib1=""abc"">
        <node1_1>
             <node1_1_1 />
        </node1_1>
    </Node1>
    ");

// select all nodes without attributes and without children
var nodes = xmlDocument.SelectNodes("//*[count(@*) = 0 and count(child::*) = 0]");

Console.WriteLine("Found {0} empty nodes", nodes.Count);

// now remove matched nodes from their parent
foreach(XmlNode node in nodes)
    node.ParentNode.RemoveChild(node);

Console.WriteLine(xmlDocument.OuterXml);
Console.ReadLine();

Smething like this should do it: 像这样的东西应该这样做:

XmlNodeList nodes = xmlDocument.GetElementsByTagName("Node1");

foreach(XmlNode node in nodes)
{
    if(node.ChildNodes.Count == 0)
         node.RemoveAll;
    else
    {
        foreach (XmlNode n in node)
        {
            if(n.InnerText==String.Empty && n.Attributes.Count == 0)
            {
                n.RemoveAll;

            }
        }
    }
}

This stylesheet uses an identity transform with an empty template matching elements without nodes or attributes, which will prevent them from being copied to the output: 此样式表使用标识转换,其中空模板匹配没有节点或属性的元素,这将阻止它们被复制到输出:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <!--Identity transform copies all items by default -->
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <!--Empty template to match on elements without attributes or child nodes to prevent it from being copied to output -->
    <xsl:template match="*[not(child::node() | @*)]"/>

</xsl:stylesheet>

To do this for all empty child nodes, use a for loop (not foreach) and in reverse order. 要对所有空子节点执行此操作,请使用for循环(不是foreach)并按相反顺序。 I resolved it as: 我把它解决为:

var xmlDocument = new XmlDocument();
xmlDocument.LoadXml(@"<node1 attrib1=""abc"">
                         <node1_1>
                            <node1_1_1 />
                         </node1_1>
                         <node1_2 />
                         <node1_3 />
                      </node1>
                      <node2 />
");
RemoveEmptyNodes(xmlDocument );

private static bool RemoveEmptyNodes(XmlNode node)
{
    if (node.HasChildNodes)
    {
        for(int I = node.ChildNodes.Count-1;I >= 0;I--)
            if (RemoveEmptyNodes(node.ChildNodes[I]))
                node.RemoveChild(node.ChildNodes[I]);
    }
    return 
        (node.Attributes == null || 
            node.Attributes.Count == 0) && 
        node.InnerText.Trim() == string.Empty;
}

The recursive calls (similarly to other solutions) eliminate the duplicated document processing of the xPath approach. 递归调用(与其他解决方案类似)消除了xPath方法的重复文档处理。 More importantly the code is more readable and more readily editable. 更重要的是,代码更易读,更容易编辑。 Win-Win. 双赢。

So, this solution will remove <node2> , but also correctly removes <node1_2> and <node1_3> . 因此,此解决方案将删除<node2> ,但也会正确删除<node1_2><node1_3>

Update: Found a notable performance increase by using the following Linq implementation. 更新:使用以下Linq实现发现性能显着提高。

string myXml = @"<node1 attrib1=""abc"">
                         <node1_1>
                            <node1_1_1 />
                         </node1_1>
                         <node1_2 />
                         <node1_3 />
                      </node1>
                      <node2 />
");
XElement xElem = XElement.Parse(myXml);
RemoveEmptyNodes2(xElem);

private static void RemoveEmptyNodes2(XElement elem)
{
    int cntElems = elem.Descendants().Count();
    int cntPrev;
    do
    {
        cntPrev = cntElems;
        elem.Descendants()
            .Where(e => 
                string.IsNullOrEmpty(e.Value.Trim()) && 
                !e.HasAttributes).Remove();
        cntElems = elem.Descendants().Count();
    } while (cntPrev != cntElems);
}

The loop handles cases where a parent needs to be removed because its only child was removed. 循环处理需要删除父项的情况,因为它的唯一子项已被删除。 Using the XContainer or derivatives tends to have similar performance increases due to the IEnumerable implementations behind the scenes. 由于幕后的IEnumerable实现,使用XContainer或衍生产品往往会有类似的性能提升。 It's my new favorite thing. 这是我最喜欢的事情。

On an arbitrary 68MB xml file RemoveEmptyNodes tends to take about 90sec, while RemoveEmptyNodes2 tends to take about 1sec. 在任意68MB xml文件上, RemoveEmptyNodes往往需要大约90秒,而RemoveEmptyNodes2往往需要大约1秒。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM