I got an XML file that is 9MB large. Apparently, it is broken.
I want to check if on any level 2 sibling elements have an attribute "Id" with same value.
Currently it goes too slow. What kind of optimizations I could make to this code?
Edited to include some tips
namespace ConsoleApplication1{
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Xml.Linq;
internal class Program{
private const string _pathToXml=@"C:\4\4";
private static readonly List<object> _duplicateLeafs=new List<object>();
private static void Main(){
var xml=ReadXml();
var elements=xml.Descendants();
foreach(var element in elements)
FindDupes(element);
Console.ReadLine();
Debugger.Break();
}
private static XDocument ReadXml(){
return XDocument.Parse(File.ReadAllText(_pathToXml));
}
private static void FindDupes(XElement element){
var elements=element.Descendants();
var elementsWithIds=elements.Where(x=>x.Attribute("Id")!=null);
var ids=elementsWithIds.Select(x=>x.Attribute("Id")).ToList();
for(var i=0;i<ids.Count;i++)
for(var j=i+1;j<ids.Count;j++)
if(i!=j&&ids[i]==ids[j])
_duplicateLeafs.Add(elementsWithIds.First(x=>x.Attribute("Id")==ids[i]));
foreach(var subElement in elements)
FindDupes(subElement);
}
}
}
您说要检查2级后代,但是FindDupes是递归的,因此您要在每个调用的foreach循环中递归检查两个级别。
you are repeat checking things, so you could do int j = i+1
instead of starting at 0.
you wouldn't have to check i != j
then
In your for loops store the list count to a variable rather than accessing the Count property.
for (int i = 0, idCount = ids.Count; i++) { }
Store ids[i] to a local variable instead of looking it up in the collection more than once.
Edit: Made the following changes.
private const string _pathToXml = @"C:\test.xml";
private static readonly List<object> _duplicateLeafs = new List<object>();
private static void Main()
{
var xml = ReadXml();
var elements = xml.Descendants();
FindDupes(elements);
}
private static void FindDupes(IEnumerable<XElement> elements)
{
foreach (var element in elements)
{
var subElements = element.Descendants();
var subElementsWithIds = subElements.Where(x => x.Attribute("Id") != null).ToList();
var ids = subElementsWithIds.Select(x => x.Attribute("Id")).ToList();
var duplicates = ids.GroupBy(s => s.Value).SelectMany(grp => grp.Skip(1)).Distinct().ToList();
if (duplicates != null)
{
_duplicateLeafs.AddRange(duplicates);
}
FindDupes(subElements);
}
}
Using the following xml file:
<?xml version="1.0" encoding="utf-8" ?>
<persons>
<person Id="1">
<name>Michael</name>
<age>29</age>
</person>
<person Id="1">
<name>Rebecca</name>
<age>29</age>
</person>
<person Id="2">
<name>Matthew</name>
<age>29</age>
</person>
<person Id="2">
<name>Paul</name>
<age>29</age>
</person>
</persons>
After testing your latest version and the code I provided, while loading a 16 MB file here are the times:
Time: 2.8704708 seconds Lambda solution
Time: 692.043006 seconds Nested for loops
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.