简体   繁体   English

.ToArray()的慢LINQ查询

[英]Slow LINQ query for .ToArray()

I am using following query 我正在使用以下查询

foreach (var callDetailsForNode_ReArrange in callDetailsForNodes_ReArrange)
{
    var test = from r1 in dtRowForNode.AsEnumerable()
               join r2 in dtFileRowForNode.AsEnumerable()
               on r1.Field<int>("Lng_Upload_Id") equals r2.Field<int>("Lng_Upload_Id")
               where ((r1.Field<string>("Txt_Called_Number") == callDetailsForNode_ReArrange.caller2.ToString()) || r1.Field<string>("Txt_Calling_Number") == callDetailsForNode_ReArrange.caller2.ToString())
               select r2.Field<string>("Txt_File_Name");

    var d = test.Distinct();
}

Upto here this query run in no time. 此时此查询立即运行。 But as I added 但正如我补充道

string[] str =d.ToArray();
strFileName = string.Join(",", str);

It takes almost 4-5 seconds to run. 运行大约需要4-5秒。 What makes it so slow on adding .ToArray() ? 添加.ToArray()什么?

Upto here this query run in no time. 此时此查询立即运行。

Up to here, it hasn't actually done anything, except build a deferred-execution model that represents the pending query. 到目前为止,除了构建表示挂起查询的延迟执行模型之外,它实际上没有任何事情。 It doesn't start iterating until you call MoveNext() on the iterator, ie via foreach , in your case via .ToArray() . 它直到您在迭代器上调用MoveNext() ,即通过foreach ,在您的情况下通过.ToArray()调用它才开始迭代。

So: it takes time because it is doing work . 所以:它需要时间,因为它正在做工作

Consider: 考虑:

static IEnumerable<int> GetData()
{
    Console.WriteLine("a");
    yield return 0;
    Console.WriteLine("b");
    yield return 1;
    Console.WriteLine("c");
    yield return 2;
    Console.WriteLine("d");
}
static void Main()
{
    Console.WriteLine("start");
    var data = GetData();
    Console.WriteLine("got data");
    foreach (var item in data)
        Console.WriteLine(item);
    Console.WriteLine("end");
}

This outputs: 这输出:

start
got data
a
0
b
1
c
2
d
end

Note how the work doesn't all happen at once - it is both deferred ( a comes after got data ) and spooling (we don't get a ,..., d , 0 ,... 2 ). 需要注意的是如何工作的不一切发生一次-它既是延迟( a自带之后got data )和后台(我们没有得到a ,..., d0 ,... 2 )。


Related: this is roughly how Distinct() works, from comments: 相关:这大致是Distinct()工作方式,来自评论:

public static IEnumerable<T> Distinct<T>(this IEnumerable<T> source) {
    var seen = new HashSet<T>();
    foreach(var item in source) {
        if(seen.Add(item)) yield return item;
    }
}

... ...

and a new Join operation: 和一个新的Join操作:

public static string Join(this IEnumerable<string> source, string separator) {
    using(var iter = source.GetEnumerator()) {
        if(!iter.MoveNext()) return "";
        var sb = new StringBuilder(iter.Current);
        while(iter.MoveNext())
            sb.Append(separator).Append(iter.Current);
        return sb.ToString();
    }
}

and use: 并使用:

string s = d.Join(",");

Because the query DOES NOTHING until you iterate over it, which .ToArray() does. 因为查询没有任何问题,直到你迭代它, .ToArray()

One thing to note is that the right-hand-side of a join (in your example, r2 in dtFileRowForNode.AsEnumerable() ) will be fully enumerated AS SOON as the query begins to be iterated, even if only the first element of the result is being accessed - but not until then. 需要注意的一点是,当查询开始迭代时,连接的右侧(在您的示例中, r2 in dtFileRowForNode.AsEnumerable() )将完全枚举,即使只是第一个元素结果正在被访问 - 但直到那时。

So if you did: 所以,如果你这样做:

d.First()

the r2 in dtFileRowForNode.AsEnumerable() sequence would be fully iterated (and buffered in memory), but only the first element of r1 in dtRowForNode.AsEnumerable() would be evaluated. r2 in dtFileRowForNode.AsEnumerable()序列中的r2 in dtFileRowForNode.AsEnumerable()将完全迭代(并在内存中缓冲),但只会计算r1 in dtRowForNode.AsEnumerable()r1 in dtRowForNode.AsEnumerable()的第一个元素。

For this reason, if one of your sequences in the join is much larger than the other, it's more efficient (memory-wise) to put the big sequence on the left of the join. 因此,如果连接中的一个序列比另一个序列大得多,则将大序列放在连接的左侧会更有效(以内存方式)。 The entire sequence on the right of the join will be buffered in memory. 连接右侧的整个序列将缓冲在内存中。

(I should point out that only applies to Linq-to-objects. Linq-to-SQL will run those queries in the database, so it handles the buffering.) (我应该指出,只适用于Linq-to-objects.Linq-to-SQL将在数据库中运行这些查询,因此它处理缓冲。)

You need to read on deffered evaluation of linq statements. 您需要阅读对linq语句的默认评估。 Query is not completed unless you explicitely call for results - like iterating in foreach , calling ToArray , ToList , Sum , First or one of other methods that evaluate query. 除非您明确地调用结果,否则查询未完成 - 例如在foreach迭代,调用ToArrayToListSumFirst或其他评估查询的方法。

So it is your query that takes so much time to complete, not ToArray call. 因此,您的查询需要花费很长时间才能完成,而不是ToArray调用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM