简体   繁体   中英

What is the fastest way to go through a XML file in C#?

I have a program that goes through thousands of files and has to check if they have the correct xml-format. The problem is that it takes ages to complete, and I think that's because of the type of xml reader I use.

In the Method below are 3 different versions which I tried, the first one is the fastest, but only by 5%. (the method does not need to check if the file is a xml)

private bool HasCorrectXmlFormat(string filePath)
{
    try
    {
        //-Version 1----------------------------------------------------------------------------------------
        XmlReader reader = XmlReader.Create(filePath, new XmlReaderSettings() { IgnoreComments = true, IgnoreWhitespace = true });

        string[] elementNames = new string[] { "DocumentElement", "Protocol", "DateTime", "Item", "Value" };

        int i = 0;

        while (reader.Read())
        {
            if (reader.NodeType == XmlNodeType.Element)
            {
                if (reader.Name != elementNames.ElementAt(i))
                {
                    return false;
                }

                if (i >= 4)
                {
                    return true;
                }

                i++;
            }

        }

        return false;
        //--------------------------------------------------------------------------------------------------


        //-  Version 2  ------------------------------------------------------------------------------------
        IEnumerable<XElement> xmlElements = XDocument.Load(filePath).Descendants();

        string[] elementNames = new string[] { "DocumentElement", "Protocol", "DateTime", "Item", "Value" };

        for (int i = 0; i < 5; i++)
        {
            if (xmlElements.ElementAt(i).Name != elementNames.ElementAt(i))
            {
                return false;
            }
        }

        return true;
        //--------------------------------------------------------------------------------------------------


        //-  Version 3  ------------------------------------------------------------------------------------
        XDocument doc = XDocument.Load(filePath);

        if (doc.Root.Name != "DocumentElement")
        {
            return false;
        }

        if (doc.Root.Elements().First().Name != "Protocol")
        {
            return false;
        }

        if (doc.Root.Elements().First().Elements().ElementAt(0).Name != "DateTime")
        {
            return false;
        }

        if (doc.Root.Elements().First().Elements().ElementAt(1).Name != "Item")
        {
            return false;
        }

        if (doc.Root.Elements().First().Elements().ElementAt(2).Name != "Value")
        {
            return false;
        }

        return true;
        //--------------------------------------------------------------------------------------------------
    }
    catch (Exception)
    {
        return false;
    }
}

What I need is a faster way to do this. Is there a faster way to go through a xml file? I only have to check if the first 5 Elements have the correct names.

UPDATE

The Xml-Files are only 2-5 KBs in size, rarely more than that. Files are located on a local server. I am on a laptop which has a ssd.

Here are some test results:

在此处输入图片说明

在此处输入图片说明

在此处输入图片说明

I should also add that the files are filtered before, so only xml files are given to the method. I get the files with the following Method:

public List<FileInfo> GetCompatibleFiles()
    {
        return new DirectoryInfo(folderPath)
                    .EnumerateFiles("*", searchOption)
                    .AsParallel()
                    .Where(file => file.Extension == ".xml" ? HasCorrectXmlFormat(file.FullName) : false)
                    .ToList();
    }

This Method is not in my code like this (it put two methods together), this is just to show how the HasCorrectXmlFormat Method is called. You dont have to correct this Method, I know it can be improved.

UDPATE 2

Here are the two full methods mentioned at the end of update 1:

private void WriteAllFilesInList()
    {
        allFiles = new DirectoryInfo(folderPath)
                    .EnumerateFiles("*", searchOption)
                    .AsParallel()
                    .ToList();
    }

private void WriteCompatibleFilesInList()
    {
        compatibleFiles = allFiles
                            .Where(file => file.Extension == ".xml" ? HasCorrectXmlFormat(file.FullName) : false)
                            .ToList();
    }

Both methods are only called once in the entire program (if either the allFiles or compatibleFiles List is null).

UPDATE 3

It seems like the WriteAllFilesInList Method is the real problem here, shown here:

在此处输入图片说明

FINAL UPDATE

As it seems, my method doesn't need any improvement as the bottleneck is something else.

I would write code like this using Xml Linq which is a little faster than your code. You code is looping through the xml file multiple times while mine is going through file only once.

    try
    {

        XDocument doc = XDocument.Load(filePath);
        XElement root = doc.Root;
        if (doc.Root.Name != "DocumentElement")
        {
            return false;
        }
        else
        {
            XElement protocol = root.Elements().First();
            if (protocol.Name != "Protocol")
            {
                return false;
            }
            else
            {
                XElement dateTime = protocol.Elements().First();
                if (dateTime.Name != "DateTime")
                {
                    return false;
                }
                XElement item = protocol.Elements().Skip(1).First();
                if (item.Name != "Item")
                {
                    return false;
                }
                XElement value = protocol.Elements().Skip(2).First();
                if (doc.Root.Elements().First().Elements().ElementAt(2).Name != "Value")
                {
                    return false;
                }
 
            }

        }
    }
    catch (Exception)
    {
        return false;
    }
    return true;
}

Here is the example, which reads sample XML and shows comparison between Linq/ XMlReader and XmlDocument

Linq is fastest.

Sample Code

using System;
using System.Diagnostics;
using System.Linq;
using System.Xml;
using System.Xml.Linq;

namespace ReadXMLInCsharp
{
  class Program
  {
    static void Main(string[] args)
    {
     
        //returns url of main directory which contains "/bin/Debug"
        var url=System.IO.Path.GetDirectoryName(
System.Reflection.Assembly.GetExecutingAssembly().GetName().CodeBase);
        
        //correction in path to point it in Root directory
        var mainpath = url.Replace("\\bin\\Debug", "") + "\\books.xml";

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        //create XMLDocument object
        XmlDocument xmlDoc = new XmlDocument();
        //load xml file
        xmlDoc.Load(mainpath);
        //save all nodes in XMLnodelist
        XmlNodeList nodeList = xmlDoc.DocumentElement.SelectNodes("/catalog/book");

        //loop through each node and save it value in NodeStr
        var NodeStr = "";

        foreach (XmlNode node in nodeList)
        {
            NodeStr = NodeStr + "\nAuthor " + node.SelectSingleNode("author").InnerText;
            NodeStr = NodeStr + "\nTitle " + node.SelectSingleNode("title").InnerText;
            NodeStr = NodeStr + "\nGenre " + node.SelectSingleNode("genre").InnerText;
            NodeStr = NodeStr + "\nPrice " + node.SelectSingleNode("price").InnerText;
            NodeStr = NodeStr + "\nDescription -" + node.SelectSingleNode("description").InnerText;


        }
        //print all Authors details
        Console.WriteLine(NodeStr);
        stopwatch.Stop();
        Console.WriteLine();
        Console.WriteLine("Time elapsed using XmlDocument (ms)= " + stopwatch.ElapsedMilliseconds);
        Console.WriteLine();

        stopwatch.Reset();

        stopwatch.Start();
        NodeStr = "";
        //linq method
        //get all elements inside book
        foreach (XElement level1Element in XElement.Load(mainpath).Elements("book"))
        {
            //print each element value
            //you can also print XML attribute value, instead of .Element use .Attribute
            NodeStr = NodeStr + "\nAuthor " + level1Element.Element("author").Value;
            NodeStr = NodeStr + "\nTitle " + level1Element.Element("title").Value;
            NodeStr = NodeStr + "\nGenre " + level1Element.Element("genre").Value;
            NodeStr = NodeStr + "\nPrice " + level1Element.Element("price").Value;
            NodeStr = NodeStr + "\nDescription -" + level1Element.Element("description").Value;
        }

        //print all Authors details
        Console.WriteLine(NodeStr);
        stopwatch.Stop();
        Console.WriteLine();
        Console.WriteLine("Time elapsed using linq(ms)= " + stopwatch.ElapsedMilliseconds);
        Console.WriteLine();

        stopwatch.Reset();
        stopwatch.Start();
        //method 3
        //XMLReader
        XmlReader xReader = XmlReader.Create(mainpath);

        xReader.ReadToFollowing("book");
        NodeStr = "";
        while (xReader.Read())
        {
            switch (xReader.NodeType)
            {
                case XmlNodeType.Element:
                    NodeStr = NodeStr + "\nElement name:" + xReader.Name;
                    break;
                case XmlNodeType.Text:
                    NodeStr = NodeStr + "\nElement value:" + xReader.Value;
                    break;
                case XmlNodeType.None:
                    //do nothing
                    break;

            }
        }

        //print all Authors details
        Console.WriteLine(NodeStr);
        stopwatch.Stop();
        Console.WriteLine();
        Console.WriteLine("Time elapsed using XMLReader (ms)= " + stopwatch.ElapsedMilliseconds);
        Console.WriteLine();
        stopwatch.Reset();


        Console.ReadKey();
    }
  }
}

Output:

-- First Run
Time elapsed using XmlDocument (ms)= 15

Time elapsed using linq(ms)= 7

Time elapsed using XMLReader (ms)= 12

-- Second Run
Time elapsed using XmlDocument (ms)= 18

Time elapsed using linq(ms)= 3

Time elapsed using XMLReader (ms)= 15

I have removed some output to show only comparison data.

Source: Open and Read XML in C# (Examples using Linq, XMLReader, XMLDocument)

Edit : If i comment ' Console.WriteLine(NodeStr) ' from all methods and prints only time comparison. This is what I get

Time elapsed using XmlDocument (ms)= 11


Time elapsed using linq(ms)= 0


Time elapsed using XMLReader (ms)= 0

Basically it depends on how you are processing the data and how you are reading XML. Linq/XML reader once look more promising in terms of speed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM