简体   繁体   English

IEnumerable在对其进行过滤时需要很长时间才能处理

[英]IEnumerable takes too long to process when filtering on it

I have a feeling I know that what the reason is for this behavior, but I don't know what the best way of resolving it will be. 我有一种感觉我知道这种行为的原因是什么,但我不知道解决它的最佳方法是什么。

I have built a LinqToSQL query: 我已经构建了一个LinqToSQL查询:

public IEnumerable<AllConditionByCountry> GenerateConditions(int paramCountryId)
{
    var AllConditionsByCountry =
            (from cd in db.tblConditionDescriptions...
             join...
             join...
             select new AllConditionByCountry
             {
                 CountryID = cd.CountryID,
                 ConditionDescription = cd.ConditionDescription,
                 ConditionID = cd.ConditionID,
             ...
             ...
            }).OrderBy(x => x.CountryID).AsEnumerable<AllConditionByCountry>();

    return AllConditionsByCountry;
}

This query returns about 9500+ rows of data. 此查询返回大约9500多行数据。

I'm calling this from my Controller like so: 我是这样从我的控制器调用这个:

svcGenerateConditions generateConditions = new svcGenerateConditions(db);
IEnumerable<AllConditionByCountry> AllConditionsByCountry;
AllConditionsByCountry = generateConditions.GenerateConditions(1);

Which then I'm looping through: 然后我循环:

foreach (var record in AllConditionsByCountry)
{
    ...
    ...
    ...

This is where I think the issue is: 这是我认为问题所在:

var rList = AllConditionsByCountry
           .Where(x => x.ConditionID == conditionID)
           .Select(x => x)
           .AsEnumerable();

I'm doing an nested loop based off the data that I'm gathering from the above query (utilizing the original data I'm getting from AllConditionByCountry . I think this is where my issue lies. When it is doing the filter on the data, it SLOWS down greatly. 我正在根据我从上面的查询中收集的数据做一个嵌套循环(利用我从AllConditionByCountry获得的原始数据。我认为这是我的问题所在。当它对数据进行过滤时它大大减缓了。

Basically this process writes out a bunch of files (.json, .html) I've tested this at first using just ADO.Net and to run through all of these records it took about 4 seconds. 基本上这个过程写出了一堆文件(.json,.html)我首先使用ADO.Net测试了这个,并且运行所有这些记录花了大约4秒钟。 Using EF (stored procedure or LinqToSql) it takes minutes. 使用EF(存储过程或LinqToSql)需要几分钟。

Is there anything I should do with my types of lists that I'm using or is that just the price of using LinqToSql? 我应该对我正在使用的列表类型做什么,或者只是使用LinqToSql的价格?

I've tried to return List<AllConditionByCountry> , IQueryable , IEnumerable from my GenerateConditions method. 我试图从我的GenerateConditions方法返回List<AllConditionByCountry>IQueryableIEnumerable List took a very long time (similar to what I'm seeing now). 列表花了很长时间(类似于我现在看到的)。 IQueryable I got errors when I tried to do the 2nd filter (Query results cannot be enumerated more than once). IQueryable我尝试进行第二次过滤时遇到错误(查询结果不能多​​次枚举)。

I have run this same Linq statement in LinqPad and it returns in less than a second. 我在LinqPad中运行了同样的Linq语句,它在不到一秒的时间内返回。

I'm happy to add any additional information. 我很乐意添加任何其他信息。

Please let me know. 请告诉我。

Edit: 编辑:

foreach (var record in AllConditionsByCountry)
{
    ...
    ...
    ...
    var rList = AllConditionsByCountry
               .Where(x => x.ConditionID == conditionID)
               .Select(x => x)
               .AsEnumerable();                        
    conditionDescriptionTypeID = item.ConditionDescriptionTypeId;
    id = conditionDescriptionTypeID + "_" + count.ToString();              
    ...
    ...
}

TL;DR: You're making 9895 queries against the database instead of one. TL; DR:您正在对数据库进行9895次查询,而不是一次。 You need to rewrite your query such that only one is executed. 您需要重写查询,以便只执行一个查询。 Look into how IEnumerable works for some hints into doing this. 看看IEnumerable如何为这样做提供一些提示。

Ah, yeah, that for loop is your problem. 啊,是的, for循环是你的问题。

foreach (var record in AllConditionsByCountry)
{
  ...
  ...
  ...
  var rList = AllConditionsByCountry.Where(x => x.ConditionID == conditionID).Select(x => x).AsEnumerable();                        
  conditionDescriptionTypeID = item.ConditionDescriptionTypeId;
  id = conditionDescriptionTypeID + "_" + count.ToString();              
  ...
  ...
}

Linq-to-SQL works similarly to Linq in that it (loosely speaking) appends functions to a chain to be executed when the enumerable is iterated - for example, Linq-to-SQL与Linq的工作方式类似,因为它(松散地说)将函数附加到链,以便在枚举可枚举时执行 - 例如,

Enumerable.FromResult(1).Select(x => throw new Exception());

This doesn't actually cause your code to crash because the enumerable is never iterated. 这实际上并不会导致代码崩溃,因为枚举永远不会被迭代。 Linq-to-SQL operates on a similar principle. Linq-to-SQL的运作原理类似。 So, when you define this: 所以,当你定义这个:

var AllConditionsByCountry =
        (from cd in db.tblConditionDescriptions...
         join...
         join...
         select new AllConditionByCountry
         {
             CountryID = cd.CountryID,
             ConditionDescription = cd.ConditionDescription,
             ConditionID = cd.ConditionID,
         ...
         ...
        }).OrderBy(x => x.CountryID).AsEnumerable<AllConditionByCountry>();

You're not executing anything against a database, you're just instructing C# to build a query that does this when it is iterated. 您没有对数据库执行任何操作,您只是指示C#构建一个在迭代时执行此操作的查询。 That's why just declaring this query is fast. 这就是为什么只是声明这个查询很快。

Your problem comes when you get to your for loop. 当你进入你的循环时,问题出现了。 When you hit your for loop, you signal that you want to start iterating the AllConditionsByCountry iterator. 当你按下for循环时,表示你想要开始迭代AllConditionsByCountry迭代器。 This causes .NET to go off and execute the initial query, which takes time. 这会导致.NET关闭并执行初始查询,这需要时间。

When you call AllConditionsByCountry.Where(x => x.ConditionID == conditionID) in the for loop, you're constructing another iterator that doesn't actually do anything. 当您在for循环中调用AllConditionsByCountry.Where(x => x.ConditionID == conditionID)时,您正在构建另一个实际上不执行任何操作的迭代器。 Presumably you actually use the result of rList within that loop, however, you're essentially constructing N queries to be executed against the database (where N is the size of AllConditionsByCountry). 据推测,您实际上在该循环中使用了rList的结果,但是,您实际上构建了针对数据库执行的N个查询(其中N是AllConditionsByCountry的大小)。

This leads to a scenario where you are effectively executing approximately 9501 queries against the database - 1 for your initial query and then one query for each element within the original query. 这导致您有效地对数据库执行大约9501次查询的情况 - 1用于初始查询,然后对原始查询中的每个元素执行一次查询。 The drastic slowdown compared to ADO.NET is because you're probably making 9500 more queries than you were originally. 与ADO.NET相比,大幅放缓是因为您可能比原来多了9500个查询。

You ideally should change the code so that there is one and only one query executed against the database. 理想情况下,您应该更改代码,以便对数据库执行一个且仅执行一个查询。 You've a couple of options: 你有几个选择:

  • Rewrite the Linq-to-SQL query such that all of the legwork is done by the SQL database 重写Linq-to-SQL查询,以便所有的工作都由SQL数据库完成
  • Rewrite the Linq-to-SQL query so it looks like this 重写Linq-to-SQL查询,使其看起来像这样

    var conditions = AllConditionsByCountry.ToList(); var conditions = AllConditionsByCountry.ToList(); foreach (var record in conditions) { var rList = conditions.Where(....); foreach(条件中的var记录){var rList = conditions.Where(....); } }

Note that in that example I am searching conditions rather than AllConditionsByCountry - .ToList() will return a list that has already been iterated so you do not create any more database queries. 请注意,在该示例中,我搜索conditions而不是AllConditionsByCountry - .ToList()将返回已经迭代的列表,因此您不再创建数据库查询。 This will still be slow (since you're doing O(N^2) over 9500 records), but it will still be faster than creating 9500 queries since it will all be done in memory. 仍然会很慢(因为你正在做超过9500条记录的O(N ^ 2)),但它仍然比创建9500查询更快,因为它将全部在内存中完成。

  • Just rewrite the query in ADO.NET if you're more comfortable with raw SQL than Linq-to-SQL. 如果您对原始SQL比Linq-to SQL更熟悉,只需在ADO.NET中重写查询。 There's nothing wrong with this. 这没什么不对。

I think I should point out what methods cause an IEnumerable to be iterated and what ones don't. 我想我应该指出哪些方法会导致IEnumerable被迭代而哪些方法没有。

Any method named As* (such as AsEnumerable<T>() ) do not cause the enumerable to be iterated. 任何名为As*方法(例如AsEnumerable<T>() )都不会导致枚举被迭代。 It's essentially a way of casting from one type to another. 它本质上是一种从一种类型转换为另一种类型的方式。

Any method named To* (such as ToList<T>() ) will cause the enumerable to be iterated. 任何名为To*方法(例如ToList<T>() )都将导致重复枚举。 In the event of Linq-to-SQL this will also execute the database query. 在Linq-to-SQL的情况下,这也将执行数据库查询。 Any method that also results in you getting a value out of the enumerable will also cause iteration. 任何导致您从可枚举中获取值的方法也会导致迭代。 You can use this to your advantage by creating a query and forcing iteration using ToList() and then searching that list - this will cause the comparisons to be done in memory, which is what I demo above 您可以通过创建查询并使用ToList()强制迭代然后搜索该列表来使用此优势 - 这将导致比较在内存中完成,这是我在上面演示的内容

//Firstly: IEnumerable<> should be List<>, because you need to massage result later
public IEnumerable<AllConditionByCountry> GenerateConditions(int paramCountryId)
{
    var AllConditionsByCountry =
            (from cd in db.tblConditionDescriptions...
             join...
             join...
             select new AllConditionByCountry
             {
                 CountryID = cd.CountryID,
                 ConditionDescription = cd.ConditionDescription,
                 ConditionID = cd.ConditionID,
             ...
             ...
            })

            .OrderBy(x => x.CountryID)
            .ToList() //return a list, so only 1 query is executed
            //.AsEnumerable<AllConditionByCountry>();//it's useless code, anyway.

    return AllConditionsByCountry;
}

about this part: 关于这部分:

foreach (var record in AllConditionsByCountry) // you can use AllConditionsByCountry.ForEach(record=>{...});
{
  ...
  //AllConditionsByCountry will not query db again, because it's a list, no long a query
  var rList = AllConditionsByCountry.Where(x => x.ConditionID == conditionID);//.Select(x => x).AsEnumerable(); //no necessary to use AsXXX if compilation do not require it.
  ...
}

BTW, 顺便说一句,

  1. you should have your result paged, no page will need 100+ result. 你应该把你的结果分页,没有页面需要100+结果。 10K return is the issue itself. 10K回报就是问题本身。

    GenerateConditions(int paramCountryId, int page = 0, int pagesize = 50) GenerateConditions(int paramCountryId,int page = 0,int pagesize = 50)

  2. it's weird that you have to use a sub-query, usually it means GenerateConditions did not return the data structure you need, you should change it to give right data, no more subquery 你必须使用一个子查询很奇怪,通常它意味着GenerateConditions没有返回你需要的数据结构,你应该改变它以提供正确的数据,不再有子查询

  3. use compiled query to improve more: https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/ef/language-reference/compiled-queries-linq-to-entities 使用编译查询来改进更多: https//docs.microsoft.com/en-us/dotnet/framework/data/adonet/ef/language-reference/compiled-queries-linq-to-entities
  4. we don't see your full query, but usually, it's right the part you should improve, especially when you have many conditions to filter and join and group... a little change could make all differences. 我们没有看到您的完整查询,但通常情况下,您应该改进的部分是正确的,特别是当您有许多条件要过滤,加入和分组时...一点点改变可能会产生所有差异。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM