简体   繁体   English

如何以有效的方式同时读取两个XML文件

[英]How to read two XML files simultaneously in an efficient way

I have a complicated case: I have three XML files, which I need to read simultaneously and get the results based on matches. 我有一个复杂的案例:我有三个XML文件,我需要同时阅读并根据匹配得到结果。 Below is a working (but fake) example, almost similar to what I am doing. 下面是一个工作(但假的)示例,几乎与我正在做的相似。

For instance, I have two xml file, both are similar but in terms of tags and attributes, but with different contents (languages). 例如,我有两个xml文件,两者都很相似,但在标签和属性方面,但具有不同的内容(语言)。 I'm reading both languages at the same time, like in the code in a C# file: 我正在同时阅读这两种语言,就像在C#文件中的代码一样:

XElement x1 = XElement.Load (@"abc.xml");
XElement x2 = XElement.Load (@"xyz.xml");


var ch = from var1 in x1.Elements("language1") 
         where var1.Attribute("index").Value == "1"
         from var2 in x2.Elements("language2")
         where var2.Attribute("index").Value == var1.Attribute("index").Value
         select dictChapter as new
         {  
             sentenceNumber = var1.Attribute("index").Value,
             SentenceInLanguage1 = var1.Attribute("text").Value,
             SentenceInLanguage2 = var2.Attribute("text").Value,
         };

ListBox.DataContext = ch;

The problem here is that, x1 contains 1000 sentences and so x2. 这里的问题是,x1包含1000个句子,所以x2。 The above logic work like a nested loop, which is slowing down the processing a lot. 上面的逻辑工作就像一个嵌套循环,这会大大减慢处理速度。 It works like 它很像

x1.1 -> x2.1:1000
x1.2 -> x2.1:1000

or 要么

for i in x1
  for j in x2

Is there any better and efficient way to select the sentences from x1 and x2,where the sentence id of x1 is equal to the sentence id of x2? 是否有更好更有效的方法从x1和x2中选择句子,其中x1的句子id等于x2的句子id?

From what I understood that you want, 据我所知,你想要的,

You could use join to do that. 你可以使用join来做到这一点。

Here is a good example link LINQ to XML : Join Xml Data (Wriju's BLOG) 这是一个很好的示例链接LINQ to XML:Join Xml Data(Wriju的BLOG)

...or something along these lines... ......或者沿着这些方向的东西......

var root = (from var1 in x1.Elements("language1")
            join var2 in x2.Elements("language2") on (string)var1.Attribute("index") equals (string)var2.Attribute("index")
            select new
            {
                SentenceNumber = (string)var1.Attribute("index"),
                SentenceInLanguage1 = (string)var1.Element("text"),
                SentenceInLanguage2 = (string)var2.Element("text")
            });

In Linq the following statements are equivalent and will provide the same results: 在Linq中,以下陈述是等效的,并将提供相同的结果:

from i1 in items1
from i2 in items2
where i1 == i2

and

from i1 in items1
join i2 in items2 on i1 equals i2

They will even be translated to the same SQL (using Linq to SQL). 它们甚至会被翻译成相同的SQL(使用Linq to SQL)。 For MS SQL in both cases result SQL will contain join clause (that's why ther is no need to use less flexible join when you query database. 对于两种情况下的MS SQL,结果SQL将包含join子句(这就是为什么在查询数据库时不需要使用灵活性较低的join

However for Linq to Objects and Linq to XML both will be executed in different way. 但是对于Linq to ObjectsLinq to XML,两者都将以不同的方式执行。 First will result in nested loops and second won't. 首先会导致嵌套循环,第二次不会。

So you just need to change your implementation to use join as @NSGaga suggested. 所以,你只需要改变你的实现使用join作为@NSGaga建议。

Another optimization would be adding .ToList() : 另一个优化是添加.ToList()

ListBox.DataContext = ch;

I am not sure about databinding but because of deffered nature of linq there is a possibility that your expression will be re-evaluated more than once. 我不确定数据绑定,但由于linq的自然性质,您的表达式可能会被重复评估多次。

Easy! 简单! Just go through each file in sequence. 只需按顺序浏览每个文件。 On the first pass: Create a dictionary of sentenceNumber > SentenceInlanguage1 . 在第一遍:创建一个sentenceNumber > SentenceInlanguage1字典sentenceNumber > SentenceInlanguage1

On your second pass, create your enumerable as in the code you showed, pasting in the data from the first pass for the SentenceInLanguage1 variable. 在第二次传递时,在您显示的代码中创建可枚举,粘贴SentenceInLanguage1变量的第一个传递数据。

If you would prefer to go through both together, just get an enumerator ( GetEnumerator ) and go through those in a plain old while loop, moving to the next XElement for both enumerators at the end of the loop body. 如果您希望同时浏览两者,只需获取一个枚举器( GetEnumerator )并在一个普通的while循环中浏览它们,移动到循环体末端的两个枚举器的下一个XElement

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM