简体   繁体   中英

IEnumerable takes too long to process when filtering on it

I have a feeling I know that what the reason is for this behavior, but I don't know what the best way of resolving it will be.

I have built a LinqToSQL query:

public IEnumerable<AllConditionByCountry> GenerateConditions(int paramCountryId)
{
    var AllConditionsByCountry =
            (from cd in db.tblConditionDescriptions...
             join...
             join...
             select new AllConditionByCountry
             {
                 CountryID = cd.CountryID,
                 ConditionDescription = cd.ConditionDescription,
                 ConditionID = cd.ConditionID,
             ...
             ...
            }).OrderBy(x => x.CountryID).AsEnumerable<AllConditionByCountry>();

    return AllConditionsByCountry;
}

This query returns about 9500+ rows of data.

I'm calling this from my Controller like so:

svcGenerateConditions generateConditions = new svcGenerateConditions(db);
IEnumerable<AllConditionByCountry> AllConditionsByCountry;
AllConditionsByCountry = generateConditions.GenerateConditions(1);

Which then I'm looping through:

foreach (var record in AllConditionsByCountry)
{
    ...
    ...
    ...

This is where I think the issue is:

var rList = AllConditionsByCountry
           .Where(x => x.ConditionID == conditionID)
           .Select(x => x)
           .AsEnumerable();

I'm doing an nested loop based off the data that I'm gathering from the above query (utilizing the original data I'm getting from AllConditionByCountry . I think this is where my issue lies. When it is doing the filter on the data, it SLOWS down greatly.

Basically this process writes out a bunch of files (.json, .html) I've tested this at first using just ADO.Net and to run through all of these records it took about 4 seconds. Using EF (stored procedure or LinqToSql) it takes minutes.

Is there anything I should do with my types of lists that I'm using or is that just the price of using LinqToSql?

I've tried to return List<AllConditionByCountry> , IQueryable , IEnumerable from my GenerateConditions method. List took a very long time (similar to what I'm seeing now). IQueryable I got errors when I tried to do the 2nd filter (Query results cannot be enumerated more than once).

I have run this same Linq statement in LinqPad and it returns in less than a second.

I'm happy to add any additional information.

Please let me know.

Edit:

foreach (var record in AllConditionsByCountry)
{
    ...
    ...
    ...
    var rList = AllConditionsByCountry
               .Where(x => x.ConditionID == conditionID)
               .Select(x => x)
               .AsEnumerable();                        
    conditionDescriptionTypeID = item.ConditionDescriptionTypeId;
    id = conditionDescriptionTypeID + "_" + count.ToString();              
    ...
    ...
}

TL;DR: You're making 9895 queries against the database instead of one. You need to rewrite your query such that only one is executed. Look into how IEnumerable works for some hints into doing this.

Ah, yeah, that for loop is your problem.

foreach (var record in AllConditionsByCountry)
{
  ...
  ...
  ...
  var rList = AllConditionsByCountry.Where(x => x.ConditionID == conditionID).Select(x => x).AsEnumerable();                        
  conditionDescriptionTypeID = item.ConditionDescriptionTypeId;
  id = conditionDescriptionTypeID + "_" + count.ToString();              
  ...
  ...
}

Linq-to-SQL works similarly to Linq in that it (loosely speaking) appends functions to a chain to be executed when the enumerable is iterated - for example,

Enumerable.FromResult(1).Select(x => throw new Exception());

This doesn't actually cause your code to crash because the enumerable is never iterated. Linq-to-SQL operates on a similar principle. So, when you define this:

var AllConditionsByCountry =
        (from cd in db.tblConditionDescriptions...
         join...
         join...
         select new AllConditionByCountry
         {
             CountryID = cd.CountryID,
             ConditionDescription = cd.ConditionDescription,
             ConditionID = cd.ConditionID,
         ...
         ...
        }).OrderBy(x => x.CountryID).AsEnumerable<AllConditionByCountry>();

You're not executing anything against a database, you're just instructing C# to build a query that does this when it is iterated. That's why just declaring this query is fast.

Your problem comes when you get to your for loop. When you hit your for loop, you signal that you want to start iterating the AllConditionsByCountry iterator. This causes .NET to go off and execute the initial query, which takes time.

When you call AllConditionsByCountry.Where(x => x.ConditionID == conditionID) in the for loop, you're constructing another iterator that doesn't actually do anything. Presumably you actually use the result of rList within that loop, however, you're essentially constructing N queries to be executed against the database (where N is the size of AllConditionsByCountry).

This leads to a scenario where you are effectively executing approximately 9501 queries against the database - 1 for your initial query and then one query for each element within the original query. The drastic slowdown compared to ADO.NET is because you're probably making 9500 more queries than you were originally.

You ideally should change the code so that there is one and only one query executed against the database. You've a couple of options:

  • Rewrite the Linq-to-SQL query such that all of the legwork is done by the SQL database
  • Rewrite the Linq-to-SQL query so it looks like this

    var conditions = AllConditionsByCountry.ToList(); foreach (var record in conditions) { var rList = conditions.Where(....); }

Note that in that example I am searching conditions rather than AllConditionsByCountry - .ToList() will return a list that has already been iterated so you do not create any more database queries. This will still be slow (since you're doing O(N^2) over 9500 records), but it will still be faster than creating 9500 queries since it will all be done in memory.

  • Just rewrite the query in ADO.NET if you're more comfortable with raw SQL than Linq-to-SQL. There's nothing wrong with this.

I think I should point out what methods cause an IEnumerable to be iterated and what ones don't.

Any method named As* (such as AsEnumerable<T>() ) do not cause the enumerable to be iterated. It's essentially a way of casting from one type to another.

Any method named To* (such as ToList<T>() ) will cause the enumerable to be iterated. In the event of Linq-to-SQL this will also execute the database query. Any method that also results in you getting a value out of the enumerable will also cause iteration. You can use this to your advantage by creating a query and forcing iteration using ToList() and then searching that list - this will cause the comparisons to be done in memory, which is what I demo above

//Firstly: IEnumerable<> should be List<>, because you need to massage result later
public IEnumerable<AllConditionByCountry> GenerateConditions(int paramCountryId)
{
    var AllConditionsByCountry =
            (from cd in db.tblConditionDescriptions...
             join...
             join...
             select new AllConditionByCountry
             {
                 CountryID = cd.CountryID,
                 ConditionDescription = cd.ConditionDescription,
                 ConditionID = cd.ConditionID,
             ...
             ...
            })

            .OrderBy(x => x.CountryID)
            .ToList() //return a list, so only 1 query is executed
            //.AsEnumerable<AllConditionByCountry>();//it's useless code, anyway.

    return AllConditionsByCountry;
}

about this part:

foreach (var record in AllConditionsByCountry) // you can use AllConditionsByCountry.ForEach(record=>{...});
{
  ...
  //AllConditionsByCountry will not query db again, because it's a list, no long a query
  var rList = AllConditionsByCountry.Where(x => x.ConditionID == conditionID);//.Select(x => x).AsEnumerable(); //no necessary to use AsXXX if compilation do not require it.
  ...
}

BTW,

  1. you should have your result paged, no page will need 100+ result. 10K return is the issue itself.

    GenerateConditions(int paramCountryId, int page = 0, int pagesize = 50)

  2. it's weird that you have to use a sub-query, usually it means GenerateConditions did not return the data structure you need, you should change it to give right data, no more subquery

  3. use compiled query to improve more: https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/ef/language-reference/compiled-queries-linq-to-entities
  4. we don't see your full query, but usually, it's right the part you should improve, especially when you have many conditions to filter and join and group... a little change could make all differences.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM