简体   繁体   中英

performance issue with System.Linq when subdividing a list into multiple lists

I wrote a method to subdivide a list of items into multiple lists using System.Linq . When I run this method for 50000 of simple integers it takes about 59.862 seconds .

Stopwatch watchresult0 = new Stopwatch();
watchresult0.Start();
var result0 = SubDivideListLinq(Enumerable.Range(0, 50000), 100).ToList();
watchresult0.Stop();
long elapsedresult0 = watchresult0.ElapsedMilliseconds;

So I tried to boost it, and wrote it with a simple loop iterating over each item in my list and it only needs 4 milliseconds :

Stopwatch watchresult1 = new Stopwatch();
watchresult1.Start();
var result1 = SubDivideList(Enumerable.Range(0, 50000), 100).ToList();
watchresult1.Stop();
long elapsedresult1 = watchresult1.ElapsedMilliseconds;

This is my Subdivide-method using Linq:

private static IEnumerable<List<T>> SubDivideListLinq<T>(IEnumerable<T> enumerable, int count)
{
    while (enumerable.Any())
    {
        yield return enumerable.Take(count).ToList();
        enumerable = enumerable.Skip(count);
    }
}

And this is my Subdivide-method with the foreach loop over each item:

private static IEnumerable<List<T>> SubDivideList<T>(IEnumerable<T> enumerable, int count)
{
    List<T> allItems = enumerable.ToList();

    List<T> items = new List<T>(count);
    foreach (T item in allItems)
    {
        items.Add(item);

        if (items.Count != count) continue;
        yield return items;
        items = new List<T>(count);
    }

    if (items.Any())
        yield return items;
}

you have any idea, why my own implementation is so much faster than dividing with Linq? Or am I doing something wrong?

And: As you can see, I know how to split lists, so this is not a duplicated of the related question. I wanted to know about performance between linq and my implementation. Not how to split lists

If someone comes here, with the same question:

So finally I did some more research and found, that the multiple enumeration with System.Linq is the cause of performance:

When I'm enumerating it to an array, to avoid the multiple enumeration, the performance gets much better (14 ms / 50k items):

T[] allItems = enumerable as T[] ?? enumerable.ToArray();
while (allItems.Any())
{
    yield return allItems.Take(count);
    allItems = allItems.Skip(count).ToArray();
}

Still, I won't use the linq approach, since it's slower. Instead I wrote an extension-method to subdivide my lists and it takes 3ms for 50k items:

public static class EnumerableExtensions
{
    public static IEnumerable<List<T>> Subdivide<T>(this IEnumerable<T> enumerable, int count)
    {

        List<T> items = new List<T>(count);
        int index = 0;
        foreach (T item in enumerable)
        {
            items.Add(item);
            index++;
            if (index != count) continue;
            yield return items;
            items = new List<T>(count);
            index = 0;
        }
        if (index != 0 && items.Any())
            yield return items;
    }
}

Like @AndreasNiedermair already wrote, this is also contained in MoreLinq -Library, called Batch . (But I won't add the library now for just this one method)

If you are after readability and performance You may want to use this algorithm instead. in terms of speed this one is really close to your non-linq version. at the same time its much more readable.

private static IEnumerable<List<T>> SubDivideListLinq<T>(IEnumerable<T> enumerable, int count)
{
    int index = 0;
    return enumerable.GroupBy(l => index++/count).Select(l => l.ToList());
}

And its alternative:

private static IEnumerable<List<T>> SubDivideListLinq<T>(IEnumerable<T> enumerable, int count)
{
    int index = 0;
    return from l in enumerable
        group l by index++/count
        into l select l.ToList();
}

Another alternative:

private static IEnumerable<List<T>> SubDivideListLinq<T>(IEnumerable<T> enumerable, int count)
{
    int index = 0;
    return enumerable.GroupBy(l => index++/count, 
                             item => item, 
                             (key,result) => result.ToList());
}

In my computer I get linq 0.006 sec versus non-linq 0.002 sec which is completely fair and acceptable to use linq.

As an advice, don't torture your self with micro optimizing code. clearly no one is gonna feel the difference of few milliseconds, so write a code that later you and others can understand easily.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM