I have a program that goes through thousands of files and has to check if they have the correct xml-format. The problem is that it takes ages to complete, and I think that's because of the type of xml reader I use.
In the Method below are 3 different versions which I tried, the first one is the fastest, but only by 5%. (the method does not need to check if the file is a xml)
private bool HasCorrectXmlFormat(string filePath)
{
try
{
//-Version 1----------------------------------------------------------------------------------------
XmlReader reader = XmlReader.Create(filePath, new XmlReaderSettings() { IgnoreComments = true, IgnoreWhitespace = true });
string[] elementNames = new string[] { "DocumentElement", "Protocol", "DateTime", "Item", "Value" };
int i = 0;
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element)
{
if (reader.Name != elementNames.ElementAt(i))
{
return false;
}
if (i >= 4)
{
return true;
}
i++;
}
}
return false;
//--------------------------------------------------------------------------------------------------
//- Version 2 ------------------------------------------------------------------------------------
IEnumerable<XElement> xmlElements = XDocument.Load(filePath).Descendants();
string[] elementNames = new string[] { "DocumentElement", "Protocol", "DateTime", "Item", "Value" };
for (int i = 0; i < 5; i++)
{
if (xmlElements.ElementAt(i).Name != elementNames.ElementAt(i))
{
return false;
}
}
return true;
//--------------------------------------------------------------------------------------------------
//- Version 3 ------------------------------------------------------------------------------------
XDocument doc = XDocument.Load(filePath);
if (doc.Root.Name != "DocumentElement")
{
return false;
}
if (doc.Root.Elements().First().Name != "Protocol")
{
return false;
}
if (doc.Root.Elements().First().Elements().ElementAt(0).Name != "DateTime")
{
return false;
}
if (doc.Root.Elements().First().Elements().ElementAt(1).Name != "Item")
{
return false;
}
if (doc.Root.Elements().First().Elements().ElementAt(2).Name != "Value")
{
return false;
}
return true;
//--------------------------------------------------------------------------------------------------
}
catch (Exception)
{
return false;
}
}
What I need is a faster way to do this. Is there a faster way to go through a xml file? I only have to check if the first 5 Elements have the correct names.
UPDATE
The Xml-Files are only 2-5 KBs in size, rarely more than that. Files are located on a local server. I am on a laptop which has a ssd.
Here are some test results:
I should also add that the files are filtered before, so only xml files are given to the method. I get the files with the following Method:
public List<FileInfo> GetCompatibleFiles()
{
return new DirectoryInfo(folderPath)
.EnumerateFiles("*", searchOption)
.AsParallel()
.Where(file => file.Extension == ".xml" ? HasCorrectXmlFormat(file.FullName) : false)
.ToList();
}
This Method is not in my code like this (it put two methods together), this is just to show how the HasCorrectXmlFormat Method is called. You dont have to correct this Method, I know it can be improved.
UDPATE 2
Here are the two full methods mentioned at the end of update 1:
private void WriteAllFilesInList()
{
allFiles = new DirectoryInfo(folderPath)
.EnumerateFiles("*", searchOption)
.AsParallel()
.ToList();
}
private void WriteCompatibleFilesInList()
{
compatibleFiles = allFiles
.Where(file => file.Extension == ".xml" ? HasCorrectXmlFormat(file.FullName) : false)
.ToList();
}
Both methods are only called once in the entire program (if either the allFiles
or compatibleFiles
List is null).
UPDATE 3
It seems like the WriteAllFilesInList
Method is the real problem here, shown here:
FINAL UPDATE
As it seems, my method doesn't need any improvement as the bottleneck is something else.
I would write code like this using Xml Linq which is a little faster than your code. You code is looping through the xml file multiple times while mine is going through file only once.
try
{
XDocument doc = XDocument.Load(filePath);
XElement root = doc.Root;
if (doc.Root.Name != "DocumentElement")
{
return false;
}
else
{
XElement protocol = root.Elements().First();
if (protocol.Name != "Protocol")
{
return false;
}
else
{
XElement dateTime = protocol.Elements().First();
if (dateTime.Name != "DateTime")
{
return false;
}
XElement item = protocol.Elements().Skip(1).First();
if (item.Name != "Item")
{
return false;
}
XElement value = protocol.Elements().Skip(2).First();
if (doc.Root.Elements().First().Elements().ElementAt(2).Name != "Value")
{
return false;
}
}
}
}
catch (Exception)
{
return false;
}
return true;
}
Here is the example, which reads sample XML and shows comparison between Linq/ XMlReader
and XmlDocument
Linq is fastest.
Sample Code
using System;
using System.Diagnostics;
using System.Linq;
using System.Xml;
using System.Xml.Linq;
namespace ReadXMLInCsharp
{
class Program
{
static void Main(string[] args)
{
//returns url of main directory which contains "/bin/Debug"
var url=System.IO.Path.GetDirectoryName(
System.Reflection.Assembly.GetExecutingAssembly().GetName().CodeBase);
//correction in path to point it in Root directory
var mainpath = url.Replace("\\bin\\Debug", "") + "\\books.xml";
var stopwatch = new Stopwatch();
stopwatch.Start();
//create XMLDocument object
XmlDocument xmlDoc = new XmlDocument();
//load xml file
xmlDoc.Load(mainpath);
//save all nodes in XMLnodelist
XmlNodeList nodeList = xmlDoc.DocumentElement.SelectNodes("/catalog/book");
//loop through each node and save it value in NodeStr
var NodeStr = "";
foreach (XmlNode node in nodeList)
{
NodeStr = NodeStr + "\nAuthor " + node.SelectSingleNode("author").InnerText;
NodeStr = NodeStr + "\nTitle " + node.SelectSingleNode("title").InnerText;
NodeStr = NodeStr + "\nGenre " + node.SelectSingleNode("genre").InnerText;
NodeStr = NodeStr + "\nPrice " + node.SelectSingleNode("price").InnerText;
NodeStr = NodeStr + "\nDescription -" + node.SelectSingleNode("description").InnerText;
}
//print all Authors details
Console.WriteLine(NodeStr);
stopwatch.Stop();
Console.WriteLine();
Console.WriteLine("Time elapsed using XmlDocument (ms)= " + stopwatch.ElapsedMilliseconds);
Console.WriteLine();
stopwatch.Reset();
stopwatch.Start();
NodeStr = "";
//linq method
//get all elements inside book
foreach (XElement level1Element in XElement.Load(mainpath).Elements("book"))
{
//print each element value
//you can also print XML attribute value, instead of .Element use .Attribute
NodeStr = NodeStr + "\nAuthor " + level1Element.Element("author").Value;
NodeStr = NodeStr + "\nTitle " + level1Element.Element("title").Value;
NodeStr = NodeStr + "\nGenre " + level1Element.Element("genre").Value;
NodeStr = NodeStr + "\nPrice " + level1Element.Element("price").Value;
NodeStr = NodeStr + "\nDescription -" + level1Element.Element("description").Value;
}
//print all Authors details
Console.WriteLine(NodeStr);
stopwatch.Stop();
Console.WriteLine();
Console.WriteLine("Time elapsed using linq(ms)= " + stopwatch.ElapsedMilliseconds);
Console.WriteLine();
stopwatch.Reset();
stopwatch.Start();
//method 3
//XMLReader
XmlReader xReader = XmlReader.Create(mainpath);
xReader.ReadToFollowing("book");
NodeStr = "";
while (xReader.Read())
{
switch (xReader.NodeType)
{
case XmlNodeType.Element:
NodeStr = NodeStr + "\nElement name:" + xReader.Name;
break;
case XmlNodeType.Text:
NodeStr = NodeStr + "\nElement value:" + xReader.Value;
break;
case XmlNodeType.None:
//do nothing
break;
}
}
//print all Authors details
Console.WriteLine(NodeStr);
stopwatch.Stop();
Console.WriteLine();
Console.WriteLine("Time elapsed using XMLReader (ms)= " + stopwatch.ElapsedMilliseconds);
Console.WriteLine();
stopwatch.Reset();
Console.ReadKey();
}
}
}
Output:
-- First Run
Time elapsed using XmlDocument (ms)= 15
Time elapsed using linq(ms)= 7
Time elapsed using XMLReader (ms)= 12
-- Second Run
Time elapsed using XmlDocument (ms)= 18
Time elapsed using linq(ms)= 3
Time elapsed using XMLReader (ms)= 15
I have removed some output to show only comparison data.
Source: Open and Read XML in C# (Examples using Linq, XMLReader, XMLDocument)
Edit : If i comment ' Console.WriteLine(NodeStr)
' from all methods and prints only time comparison. This is what I get
Time elapsed using XmlDocument (ms)= 11
Time elapsed using linq(ms)= 0
Time elapsed using XMLReader (ms)= 0
Basically it depends on how you are processing the data and how you are reading XML. Linq/XML reader once look more promising in terms of speed.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.