Supposed that I have an in-memory XElement as below:
<ROOT>
<CHILD1 />
<CHILD1 />
<CHILD2 />
<CHILD2 />
<CHILD1 />
<CHILD1 />
<CHILD3 />
<CHILD3 />
</ROOT>
All CHILD1 nodes must be deleted except the last one.
The tree has approx ~1 million nodes & 70% of them are CHILD1 nodes. What is the most efficient way to remove these unused nodes in a timely fashion? I tried the following:
List<XElement> remNodes = root.Elements("CHILD1").ToList();
remNodes.RemoveRange(0, remNodes.Length - 1)
and also the old & easy way:
XElement[] remNodes = root.Elements("CHILD1").ToArray();
for (i=0;i<remNodes.Length-1;i++) remNodes[i].Remove();
Both took too much time to complete (~5 hours). Is there any quicker method?
UPDATE 1
Tried to save the last node & remove as below:
XElement savedNode = remNodes.Last();
savedNode.Save("to_file");
root.Elements("CHILD1").Remove();
But it looks like the time taken is the same.
UPDATE 2
Finally, I'd ended up making the task complete in a timely fashion (less then 1 minute). I used an approach of saving valid nodes to file, then remove the whole tree & reload tree with saved nodes. Thanks @Matthew Haugen for your idea. Would you mind adding your answer?
Thanks.
尝试,
root.Elements("CHILD1").Reverse().Skip(1).Remove();
I threw this together. 200000 child elements. Doesn't take too long but I am not quite sure how many you are attempting to work with.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string data = DummyData();
//DeleteNodes("child1", data);
DeleteNodes2("child1", data);
Console.ReadLine();
}
static void DeleteNodes(string node, string xml)
{
var values = new HashSet<string>();
var xmlDocument = XDocument.Parse(xml);
foreach (var n in xmlDocument.Root.Elements(node).ToList())
{
if (!values.Add((string)node))
n.Remove();
}
}
static void DeleteNodes2(string node, string xml)
{
var xmlDocument = XDocument.Parse(xml);
xmlDocument.Root
.Elements(node).GroupBy(g => g).SelectMany(f => f).Reverse().Skip(1).Remove();
//Test to see how many are left
var duplicates = xmlDocument.Root
.Elements(node).GroupBy(g => g).ToList();
}
static string DummyData()
{
Random r = new Random();
TextWriter w = new StringWriter();
var writer = new XmlTextWriter(w);
writer.Formatting = Formatting.Indented;
writer.WriteStartElement("root");
for (int i = 0; i < 200000; i++)
{
int rand = r.Next(3);
writer.WriteStartElement(string.Format("child{0}", rand.ToString()));
writer.WriteEndElement();
}
writer.WriteEndElement();
return w.ToString();
}
}
}
If this is indeed faster, then the credit should go to Chuck there. This is only a spin off of what he suggested.
Finally, I'd ended up making the task complete in a timely fashion (less then 1 minute). I used an approach of saving valid nodes to file, then remove the whole tree & reload tree with saved nodes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.