简体   繁体   中英

Why does using IEnumerable<T> inside a recursive method make it much slower than using a List<T>?

I am using a recursive method to go through a tree of items and add its children to a flat collection:

public class Thing
{
    public int Id { get; set; }
    public string Name { get; set; }
    public int? ParentId { get; set; }
}

void Main()
{
    var sampleData = new List<Thing>
    {
        new Thing { Id = 1,  Name = "root1", ParentId = null },
        new Thing { Id = 2,  Name = "2", ParentId = 1 },
        new Thing { Id = 3,  Name = "3", ParentId = 1 },
        new Thing { Id = 4,  Name = "4", ParentId = 2 },
        new Thing { Id = 5,  Name = "5", ParentId = 2 },
        new Thing { Id = 6,  Name = "6", ParentId = 2 },
        new Thing { Id = 7,  Name = "7", ParentId = 6 },
        new Thing { Id = 8,  Name = "8", ParentId = 7 },
        new Thing { Id = 9,  Name = "9", ParentId = 8 },
        new Thing { Id = 10,  Name = "10", ParentId = 9 },
        new Thing { Id = 11,  Name = "11", ParentId = 10 },
        new Thing { Id = 12,  Name = "12", ParentId = 11 },
        new Thing { Id = 13,  Name = "13", ParentId = 12 },
        new Thing { Id = 14,  Name = "14", ParentId = 13 },
        new Thing { Id = 15,  Name = "root15", ParentId = null }
    };

    var subThings = new HashSet<Thing>();
    var stopWatch = new Stopwatch();
    stopWatch.Start();
    //AddSubThings(subThings, sampleData, new List<int> { 1 });
    AddSubThingsUsingList(subThings, sampleData, new List<int> { 1 });
    stopWatch.Elapsed.Dump();
    subThings.Dump();
}

private void AddSubThings(HashSet<Thing> resultThings, IEnumerable<Thing> sourceThings, IEnumerable<int> parentIds)
{
    if (!sourceThings.Any() || !parentIds.Any())
    {
        return;
    }
    var subThings = sourceThings.Where(st => st.ParentId.HasValue && parentIds.Contains(st.ParentId.Value));
    resultThings.UnionWith(subThings);
    AddSubThings(resultThings, sourceThings.Except(subThings), subThings.Select(st => st.Id));
}

private void AddSubThingsUsingList(HashSet<Thing> resultThings, List<Thing> sourceThings, List<int> parentIds)
{
    if (!sourceThings.Any() || !parentIds.Any())
    {
        return;
    }
    var subThings = sourceThings.Where(st => st.ParentId.HasValue && parentIds.Contains(st.ParentId.Value));
    resultThings.UnionWith(subThings);
    AddSubThingsUsingList(resultThings, sourceThings.Except(subThings).ToList(), subThings.Select(st => st.Id).ToList());
}

When I use the AddSubThings method it takes around 90 seconds to process. However if I use the AddSubThingsUsingList method it does not even take a second. Why is this?

The problem is because your create subThings from sourceThings like this

var subThings = sourceThings.Where(
    st => st.ParentId.HasValue && parentIds.Contains(st.ParentId.Value));

Then you pass the following as sourceThings to the recursive call.

sourceThings.Except(subThings)

Which is equivalent to

sourceThings.Except(
    sourceThings.Where(
        st => st.ParentId.HasValue && parentIds.Contains(st.ParentId.Value)))

That query when iterated with have to iterate over the original list twice. With each recursive call the query will build up and need to iterate the original list 2^n times where n is the recursion level. And your query is being iterated by the Any and the HashSet.UnionWith calls meaning it's more like 2^(n+1).

The other one immediately iterates the query before passing them and thus avoids this doubling problem.

You could pass the following to your recursive call for sourceThings instead to make it faster as it doesn't double up the required iterating of the original list on each recursive call.

sourceThings.Where(st => !st.ParentId.HasValue || !parentIds.Contains(st.ParentId.Value))

Ok. This is a bit complex.

Operations on IEnumerable are lazy, ie they are not executed until you need the result. Now when you pass sourceThins and subThings to AddSubThings , you've not sent a materialized collection of things, all you've done is you've defined how these collections are calculated from the original List s.

Now when the method calls itself recursively, it adds more filtering and selection to the data it has received.

All these layers of selection and filtering will be called when you call Any() .

On the other hand, in the case of List s, things are materialized after calls to Where , Except and Select , because you call ToList .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM