简体   繁体   English

在 xml 文件中查找元素值的最佳/最快方法

[英]Best/Fastest way to find values of a element in a xml file

What my program basically does is that it searches through xml's and returns the filenames of those which have specific values in a element.我的程序基本上做的是它搜索 xml 并返回那些在元素中具有特定值的文件名。

I guess I have to show you my xml first before I can continue:我想我必须先向您展示我的 xml,然后才能继续:

 <DocumentElement>
   <Protocol>
     <DateTime>10.03.2003</DateTime>
     <Item>Date</Item>
     <Value />
   </Protocol>
   <Protocol>
     <DateTime>05.11.2020</DateTime>
     <Item>Status</Item>
     <Value>Ok</Value>
   </Protocol>
 </DocumentElement>

I have a few thousand xml files whch have this exact layout.我有几千个 xml 文件,它们具有这种精确的布局。 The user can get a list of all the files with the following method:用户可以使用以下方法获取所有文件的列表:

public List<string> GetFiles(string itemValue, string element, string value)
{
    return compatibleFiles.Where(path => XmlHasValue(path, itemValue, element, value)).ToList();
}

And this methods returns wether the xml has the wanted value or not:并且此方法返回 xml 是否具有所需值:

private bool XmlHasValue(string filePath, string itemValue, string element, string value)
{
    try
    {
        string foundValue = XDocument.Load(filePath)
            .Descendants()
            .Where(el => el.Name == "Item" && el.Value == itemValue)
            .First()
            .Parent
            .Descendants()
            .Where(des => des.Name == element && des.Value == value)
            .First()
            .Value;
         return foundValue == value;
    }
    catch (Exception)
    {
        return false;
    }
}

compatibleFiles is a list with all the paths to xml files that have the correct layout/format (xml code above). compatibleFiles是一个列表,其中包含具有正确布局/格式(上面的 xml 代码)的 xml 文件的所有路径。 The user provides the GetFiles method the following:用户向GetFiles方法提供以下内容:

  • itemValue -> value the 'Item' element should have, "Status" for example itemValue -> 'Item' 元素应该具有的值,例如“状态”
  • element -> name of the element he want's to check (in the same 'Protocol' element), fE "Value" or "Date" element -> 他想要检查的元素的名称(在同一个“协议”元素中),fE“值”或“日期”
  • value -> value of the element element, "Ok" in our example value -> element元素的值,在我们的示例中为“Ok”

The problem is, that these methods take a long time to complete, and I'm almost certain there's a better and faster way to do what I want.问题是,这些方法需要很长时间才能完成,而且我几乎可以肯定有更好更快的方法来做我想做的事。 I don't know if GetFiles can get any faster but XmlHasValue sure can.我不知道GetFiles可以更快,但XmlHasValue肯定可以。 Here are some test-results:以下是一些测试结果:

在此处输入图片说明

Do you guys know any faster way to do this?你们知道有什么更快的方法吗? It would be really helpful.这真的很有帮助。

UPDATE更新

Turns out that it was all just because of the IO thread.原来这一切都只是因为 IO 线程。 If you have the same problem and think your code is bad, you should first check if it's just a thread using all the cpu power.如果你有同样的问题并认为你的代码很糟糕,你应该首先检查它是否只是一个使用所有 CPU 能力的线程。

As @Sinatr mentions.正如@Sinatr 所提到的。 Profiling should always be the first step when investigating performance.在调查性能时,分析应该始终是第一步。

A reasonable guess about what takes time would be关于什么需要时间的合理猜测是

  1. IO输入输出
  2. Parsing解析

IO could be improved by getting a faster disk, or caching results in RAM.可以通过获得更快的磁盘或在 RAM 中缓存结果来改进 IO。 The later may greatly improve performance if multiple searches are done, but introduces issues like cache-invalidation.如果进行多次搜索,后者可能会大大提高性能,但会引入缓存无效等问题。

According to " What is the best way to parse (big) XML in C# Code " XmlReader is the fastest way to parse xml.根据“ What is the best way to parse (big) XML in C# Code ”,XmlReader 是解析 xml 的最快方法。 This blog suggest XmlReader is about 2.5 times faster .这个博客建议 XmlReader 大约快 2.5 倍

If you have multiple files you could also try to process multiple files in parallel.如果您有多个文件,您也可以尝试并行处理多个文件。 Keep in mind IO is mostly serial, so you might not gain anything unless you have a SSD that can deliver data faster than files can be processed.请记住,IO 主要是串行的,因此除非您的 SSD 可以比文件处理速度更快地传输数据,否则您可能一无所获。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM