在 C# 中瀏覽 XML 文件的最快方法是什么？

Question

我有一個程序可以處理數千個文件，並且必須檢查它們是否具有正確的 xml 格式。 問題是它需要很長時間才能完成，我認為這是因為我使用的 xml 閱讀器類型。

在下面的方法中，我嘗試了 3 個不同的版本，第一個是最快的，但只有 5%。 （該方法不需要檢查文件是否為xml）

private bool HasCorrectXmlFormat(string filePath)
{
    try
    {
        //-Version 1----------------------------------------------------------------------------------------
        XmlReader reader = XmlReader.Create(filePath, new XmlReaderSettings() { IgnoreComments = true, IgnoreWhitespace = true });

        string[] elementNames = new string[] { "DocumentElement", "Protocol", "DateTime", "Item", "Value" };

        int i = 0;

        while (reader.Read())
        {
            if (reader.NodeType == XmlNodeType.Element)
            {
                if (reader.Name != elementNames.ElementAt(i))
                {
                    return false;
                }

                if (i >= 4)
                {
                    return true;
                }

                i++;
            }

        }

        return false;
        //--------------------------------------------------------------------------------------------------


        //-  Version 2  ------------------------------------------------------------------------------------
        IEnumerable<XElement> xmlElements = XDocument.Load(filePath).Descendants();

        string[] elementNames = new string[] { "DocumentElement", "Protocol", "DateTime", "Item", "Value" };

        for (int i = 0; i < 5; i++)
        {
            if (xmlElements.ElementAt(i).Name != elementNames.ElementAt(i))
            {
                return false;
            }
        }

        return true;
        //--------------------------------------------------------------------------------------------------


        //-  Version 3  ------------------------------------------------------------------------------------
        XDocument doc = XDocument.Load(filePath);

        if (doc.Root.Name != "DocumentElement")
        {
            return false;
        }

        if (doc.Root.Elements().First().Name != "Protocol")
        {
            return false;
        }

        if (doc.Root.Elements().First().Elements().ElementAt(0).Name != "DateTime")
        {
            return false;
        }

        if (doc.Root.Elements().First().Elements().ElementAt(1).Name != "Item")
        {
            return false;
        }

        if (doc.Root.Elements().First().Elements().ElementAt(2).Name != "Value")
        {
            return false;
        }

        return true;
        //--------------------------------------------------------------------------------------------------
    }
    catch (Exception)
    {
        return false;
    }
}

我需要的是一種更快的方法來做到這一點。 有沒有更快的方法來瀏覽 xml 文件？ 我只需要檢查前 5 個元素是否具有正確的名稱。

更新

Xml 文件的大小只有 2-5 KB，很少超過這個大小。 文件位於本地服務器上。 我在一台帶有 ssd 的筆記本電腦上。

下面是一些測試結果：

我還應該補充一點，這些文件之前已經過過濾，因此只將 xml 文件提供給該方法。 我使用以下方法獲取文件：

public List<FileInfo> GetCompatibleFiles()
    {
        return new DirectoryInfo(folderPath)
                    .EnumerateFiles("*", searchOption)
                    .AsParallel()
                    .Where(file => file.Extension == ".xml" ? HasCorrectXmlFormat(file.FullName) : false)
                    .ToList();
    }

這個方法在我的代碼中不是這樣的（它把兩個方法放在一起），這只是為了展示如何調用 HasCorrectXmlFormat 方法。 您不必更正此方法，我知道它可以改進。

UDPATE 2

以下是更新1末尾提到的兩個完整方法：

private void WriteAllFilesInList()
    {
        allFiles = new DirectoryInfo(folderPath)
                    .EnumerateFiles("*", searchOption)
                    .AsParallel()
                    .ToList();
    }

private void WriteCompatibleFilesInList()
    {
        compatibleFiles = allFiles
                            .Where(file => file.Extension == ".xml" ? HasCorrectXmlFormat(file.FullName) : false)
                            .ToList();
    }

這兩種方法在整個程序中只調用一次（如果allFiles或compatibleFiles列表為空）。

更新 3

似乎WriteAllFilesInList方法是這里的真正問題，如下所示：

最后更新

看起來，我的方法不需要任何改進，因為瓶頸是別的東西。

Answer 1

我會使用 Xml Linq 編寫這樣的代碼，它比您的代碼快一點。 您的代碼多次遍歷 xml 文件，而我的代碼只遍歷一次文件。

    try
    {

        XDocument doc = XDocument.Load(filePath);
        XElement root = doc.Root;
        if (doc.Root.Name != "DocumentElement")
        {
            return false;
        }
        else
        {
            XElement protocol = root.Elements().First();
            if (protocol.Name != "Protocol")
            {
                return false;
            }
            else
            {
                XElement dateTime = protocol.Elements().First();
                if (dateTime.Name != "DateTime")
                {
                    return false;
                }
                XElement item = protocol.Elements().Skip(1).First();
                if (item.Name != "Item")
                {
                    return false;
                }
                XElement value = protocol.Elements().Skip(2).First();
                if (doc.Root.Elements().First().Elements().ElementAt(2).Name != "Value")
                {
                    return false;
                }
 
            }

        }
    }
    catch (Exception)
    {
        return false;
    }
    return true;
}

Answer 2

這是示例，它讀取示例 XML 並顯示 Linq/ XMlReader和XmlDocument之間的比較

Linq 是最快的。

示例代碼

using System;
using System.Diagnostics;
using System.Linq;
using System.Xml;
using System.Xml.Linq;

namespace ReadXMLInCsharp
{
  class Program
  {
    static void Main(string[] args)
    {
     
        //returns url of main directory which contains "/bin/Debug"
        var url=System.IO.Path.GetDirectoryName(
System.Reflection.Assembly.GetExecutingAssembly().GetName().CodeBase);
        
        //correction in path to point it in Root directory
        var mainpath = url.Replace("\\bin\\Debug", "") + "\\books.xml";

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        //create XMLDocument object
        XmlDocument xmlDoc = new XmlDocument();
        //load xml file
        xmlDoc.Load(mainpath);
        //save all nodes in XMLnodelist
        XmlNodeList nodeList = xmlDoc.DocumentElement.SelectNodes("/catalog/book");

        //loop through each node and save it value in NodeStr
        var NodeStr = "";

        foreach (XmlNode node in nodeList)
        {
            NodeStr = NodeStr + "\nAuthor " + node.SelectSingleNode("author").InnerText;
            NodeStr = NodeStr + "\nTitle " + node.SelectSingleNode("title").InnerText;
            NodeStr = NodeStr + "\nGenre " + node.SelectSingleNode("genre").InnerText;
            NodeStr = NodeStr + "\nPrice " + node.SelectSingleNode("price").InnerText;
            NodeStr = NodeStr + "\nDescription -" + node.SelectSingleNode("description").InnerText;


        }
        //print all Authors details
        Console.WriteLine(NodeStr);
        stopwatch.Stop();
        Console.WriteLine();
        Console.WriteLine("Time elapsed using XmlDocument (ms)= " + stopwatch.ElapsedMilliseconds);
        Console.WriteLine();

        stopwatch.Reset();

        stopwatch.Start();
        NodeStr = "";
        //linq method
        //get all elements inside book
        foreach (XElement level1Element in XElement.Load(mainpath).Elements("book"))
        {
            //print each element value
            //you can also print XML attribute value, instead of .Element use .Attribute
            NodeStr = NodeStr + "\nAuthor " + level1Element.Element("author").Value;
            NodeStr = NodeStr + "\nTitle " + level1Element.Element("title").Value;
            NodeStr = NodeStr + "\nGenre " + level1Element.Element("genre").Value;
            NodeStr = NodeStr + "\nPrice " + level1Element.Element("price").Value;
            NodeStr = NodeStr + "\nDescription -" + level1Element.Element("description").Value;
        }

        //print all Authors details
        Console.WriteLine(NodeStr);
        stopwatch.Stop();
        Console.WriteLine();
        Console.WriteLine("Time elapsed using linq(ms)= " + stopwatch.ElapsedMilliseconds);
        Console.WriteLine();

        stopwatch.Reset();
        stopwatch.Start();
        //method 3
        //XMLReader
        XmlReader xReader = XmlReader.Create(mainpath);

        xReader.ReadToFollowing("book");
        NodeStr = "";
        while (xReader.Read())
        {
            switch (xReader.NodeType)
            {
                case XmlNodeType.Element:
                    NodeStr = NodeStr + "\nElement name:" + xReader.Name;
                    break;
                case XmlNodeType.Text:
                    NodeStr = NodeStr + "\nElement value:" + xReader.Value;
                    break;
                case XmlNodeType.None:
                    //do nothing
                    break;

            }
        }

        //print all Authors details
        Console.WriteLine(NodeStr);
        stopwatch.Stop();
        Console.WriteLine();
        Console.WriteLine("Time elapsed using XMLReader (ms)= " + stopwatch.ElapsedMilliseconds);
        Console.WriteLine();
        stopwatch.Reset();


        Console.ReadKey();
    }
  }
}

輸出：

-- First Run
Time elapsed using XmlDocument (ms)= 15

Time elapsed using linq(ms)= 7

Time elapsed using XMLReader (ms)= 12

-- Second Run
Time elapsed using XmlDocument (ms)= 18

Time elapsed using linq(ms)= 3

Time elapsed using XMLReader (ms)= 15

我刪除了一些輸出以僅顯示比較數據。

來源：在 C# 中打開和讀取 XML（使用 Linq、XMLReader、XMLDocument 的示例）

編輯：如果我從所有方法中注釋“ Console.WriteLine(NodeStr) ”並僅打印時間比較。 這就是我得到的

Time elapsed using XmlDocument (ms)= 11


Time elapsed using linq(ms)= 0


Time elapsed using XMLReader (ms)= 0

基本上，這取決於您處理數據的方式以及讀取 XML 的方式。 Linq/XML 閱讀器曾經在速度方面看起來更有希望。

在 C# 中瀏覽 XML 文件的最快方法是什么？

問題描述

2 個解決方案

解決方案1
0 2020-11-04 12:09:37

解決方案2
0 已采納 2020-11-04 12:20:31

在 C# 中瀏覽 XML 文件的最快方法是什么？

問題描述

2 個解決方案

解決方案1 0 2020-11-04 12:09:37

解決方案2 0 已采納 2020-11-04 12:20:31

解決方案1
0 2020-11-04 12:09:37

解決方案2
0 已采納 2020-11-04 12:20:31