简体   繁体   中英

Slow LINQ query for .ToArray()

I am using following query

foreach (var callDetailsForNode_ReArrange in callDetailsForNodes_ReArrange)
{
    var test = from r1 in dtRowForNode.AsEnumerable()
               join r2 in dtFileRowForNode.AsEnumerable()
               on r1.Field<int>("Lng_Upload_Id") equals r2.Field<int>("Lng_Upload_Id")
               where ((r1.Field<string>("Txt_Called_Number") == callDetailsForNode_ReArrange.caller2.ToString()) || r1.Field<string>("Txt_Calling_Number") == callDetailsForNode_ReArrange.caller2.ToString())
               select r2.Field<string>("Txt_File_Name");

    var d = test.Distinct();
}

Upto here this query run in no time. But as I added

string[] str =d.ToArray();
strFileName = string.Join(",", str);

It takes almost 4-5 seconds to run. What makes it so slow on adding .ToArray() ?

Upto here this query run in no time.

Up to here, it hasn't actually done anything, except build a deferred-execution model that represents the pending query. It doesn't start iterating until you call MoveNext() on the iterator, ie via foreach , in your case via .ToArray() .

So: it takes time because it is doing work .

Consider:

static IEnumerable<int> GetData()
{
    Console.WriteLine("a");
    yield return 0;
    Console.WriteLine("b");
    yield return 1;
    Console.WriteLine("c");
    yield return 2;
    Console.WriteLine("d");
}
static void Main()
{
    Console.WriteLine("start");
    var data = GetData();
    Console.WriteLine("got data");
    foreach (var item in data)
        Console.WriteLine(item);
    Console.WriteLine("end");
}

This outputs:

start
got data
a
0
b
1
c
2
d
end

Note how the work doesn't all happen at once - it is both deferred ( a comes after got data ) and spooling (we don't get a ,..., d , 0 ,... 2 ).


Related: this is roughly how Distinct() works, from comments:

public static IEnumerable<T> Distinct<T>(this IEnumerable<T> source) {
    var seen = new HashSet<T>();
    foreach(var item in source) {
        if(seen.Add(item)) yield return item;
    }
}

...

and a new Join operation:

public static string Join(this IEnumerable<string> source, string separator) {
    using(var iter = source.GetEnumerator()) {
        if(!iter.MoveNext()) return "";
        var sb = new StringBuilder(iter.Current);
        while(iter.MoveNext())
            sb.Append(separator).Append(iter.Current);
        return sb.ToString();
    }
}

and use:

string s = d.Join(",");

Because the query DOES NOTHING until you iterate over it, which .ToArray() does.

One thing to note is that the right-hand-side of a join (in your example, r2 in dtFileRowForNode.AsEnumerable() ) will be fully enumerated AS SOON as the query begins to be iterated, even if only the first element of the result is being accessed - but not until then.

So if you did:

d.First()

the r2 in dtFileRowForNode.AsEnumerable() sequence would be fully iterated (and buffered in memory), but only the first element of r1 in dtRowForNode.AsEnumerable() would be evaluated.

For this reason, if one of your sequences in the join is much larger than the other, it's more efficient (memory-wise) to put the big sequence on the left of the join. The entire sequence on the right of the join will be buffered in memory.

(I should point out that only applies to Linq-to-objects. Linq-to-SQL will run those queries in the database, so it handles the buffering.)

You need to read on deffered evaluation of linq statements. Query is not completed unless you explicitely call for results - like iterating in foreach , calling ToArray , ToList , Sum , First or one of other methods that evaluate query.

So it is your query that takes so much time to complete, not ToArray call.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM