Fastest way to remove XML nodes with C#

Question

Supposed that I have an in-memory XElement as below:

<ROOT>
    <CHILD1 />
    <CHILD1 />
    <CHILD2 />
    <CHILD2 />
    <CHILD1 />
    <CHILD1 />
    <CHILD3 />
    <CHILD3 />
</ROOT>

All CHILD1 nodes must be deleted except the last one.

The tree has approx ~1 million nodes & 70% of them are CHILD1 nodes. What is the most efficient way to remove these unused nodes in a timely fashion? I tried the following:

List<XElement> remNodes = root.Elements("CHILD1").ToList();
remNodes.RemoveRange(0, remNodes.Length - 1)

and also the old & easy way:

XElement[] remNodes = root.Elements("CHILD1").ToArray();
for (i=0;i<remNodes.Length-1;i++) remNodes[i].Remove();

Both took too much time to complete (~5 hours). Is there any quicker method?

UPDATE 1

Tried to save the last node & remove as below:

XElement savedNode = remNodes.Last();
savedNode.Save("to_file");
root.Elements("CHILD1").Remove();

But it looks like the time taken is the same.

UPDATE 2

Finally, I'd ended up making the task complete in a timely fashion (less then 1 minute). I used an approach of saving valid nodes to file, then remove the whole tree & reload tree with saved nodes. Thanks @Matthew Haugen for your idea. Would you mind adding your answer?

Thanks.

Answer 1

尝试，

root.Elements("CHILD1").Reverse().Skip(1).Remove();

Answer 2

I threw this together. 200000 child elements. Doesn't take too long but I am not quite sure how many you are attempting to work with.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string data = DummyData();

            //DeleteNodes("child1", data);
            DeleteNodes2("child1", data);

            Console.ReadLine();
        }

        static void DeleteNodes(string node, string xml)
        {
            var values = new HashSet<string>();
            var xmlDocument = XDocument.Parse(xml);

            foreach (var n in xmlDocument.Root.Elements(node).ToList())
            {
                if (!values.Add((string)node))
                    n.Remove();
            }
        }

        static void DeleteNodes2(string node, string xml)
        {
            var xmlDocument = XDocument.Parse(xml);

            xmlDocument.Root
                     .Elements(node).GroupBy(g => g).SelectMany(f => f).Reverse().Skip(1).Remove();

            //Test to see how many are left
            var duplicates = xmlDocument.Root
                     .Elements(node).GroupBy(g => g).ToList(); 
        }

        static string DummyData()
        {
            Random r = new Random();
            TextWriter w = new StringWriter();


            var writer = new XmlTextWriter(w);
            writer.Formatting = Formatting.Indented;
            writer.WriteStartElement("root");

            for (int i = 0; i < 200000; i++)
            {
                int rand = r.Next(3);
                writer.WriteStartElement(string.Format("child{0}", rand.ToString()));
                writer.WriteEndElement();
            }

            writer.WriteEndElement();

            return w.ToString();
        }
    }
}

If this is indeed faster, then the credit should go to Chuck there. This is only a spin off of what he suggested.

Answer 3

Finally, I'd ended up making the task complete in a timely fashion (less then 1 minute). I used an approach of saving valid nodes to file, then remove the whole tree & reload tree with saved nodes.

Fastest way to remove XML nodes with C#

Question

3 answers

solution1
0 2014-12-09 04:22:07

solution2
0 2014-12-09 05:05:04

solution3
0 ACCPTED 2015-09-03 11:34:07

Fastest way to remove XML nodes with C#

Question

3 answers

solution1 0 2014-12-09 04:22:07

solution2 0 2014-12-09 05:05:04

solution3 0 ACCPTED 2015-09-03 11:34:07

solution1
0 2014-12-09 04:22:07

solution2
0 2014-12-09 05:05:04

solution3
0 ACCPTED 2015-09-03 11:34:07