简体   繁体   English

将IEnumerable转换为字典以获得性能?

[英]Conversion of an IEnumerable to a dictionary for performance?

I have recently seen a new trend in my firm where we change the IEnumerable to a dictionary by a simple LINQ transformation as follows: 我最近在我的公司看到了一个新趋势,我们通过简单的LINQ转换将IEnumerable更改为字典,如下所示:

enumerable.ToDictionary(x=>x);

We mostly end up doing this when the operation on the collection is a Contains/Access and obviously a dictionary has a better performance in such cases. 当集合上的操作是包含/访问时,我们大多数情况下最终会这样做,显然字典在这种情况下具有更好的性能。

But I realise that converting the Enumerable to a dictionary has its own cost and I am wondering at what point does it start to break-even (if it does) ie the performance of IEnumerable Contains/Access is equal to ToDictionary + access/contains. 但我意识到将Enumerable转换为字典有其自身的成本,我想知道它在什么时候开始收支平衡(如果它),即IEnumerable Contains / Access的性能等于ToDictionary + access / contains。

Ok I might add there is no databse access the enumerable might be created from a database query and thats it and the enumerable may be edited after that too.. 好的我可能会添加没有数据库访问,可以从数据库查询创建枚举,这就是它,并且可以在之后编辑枚举...

Also it would be interesting to know how does the datatype of the key affect the performance? 知道密钥的数据类型如何影响性能也很有趣?

The lookup might be 2-5 times generally but sometimes may be one too. 查询可能一般是2-5次,但有时也可能是一次。 But i have seen things like For an enumerable: 但是我已经看到了类似于可枚举的东西:

 var element=Enumerable.SingleorDefault(x=>x.Id);
 //do something if element is null or return

for a dictionary: 对于字典:

 if(dictionary.ContainsKey(x))
 //do something if false else  return

This has been bugging me for quite some time now. 这已经困扰了我很长一段时间了。

Performance of Dictionary Compared to IEnumerable 字典的性能与IEnumerable相比

A Dictionary , when used correctly, is always faster to read from (except in cases where the data set is very small, eg 10 items). 正确使用时, Dictionary总是更快读取 (除非数据集非常小,例如10个项目)。 There can be overhead when creating it. 创建它时可能会有开销。

Given m as the amount of lookups performed against the same object (these are approximate): 给定m作为针对同一对象执行的查找量(这些是近似值):

  • Performance of an IEnumerable (created from a clean list): O(mn) IEnumerable性能(从干净的列表创建):O(mn)
    • This is because you need to look at all the items each time (essentially m * O(n) ). 这是因为您需要每次都查看所有项目(主要是m * O(n) )。
  • Performance of a Dictionary : O(n) + O(1m) , or O(m + n) Dictionary性能: O(n) + O(1m)O(m + n)
    • This is because you need to insert items first ( O(n) ). 这是因为您需要先插入项目( O(n) )。

In general it can be seen that the Dictionary wins when m > 1 , and the IEnumerable wins when m = 1 or m = 0 . 通常可以看出,当m > 1Dictionary获胜,而当m = 1m = 0时, IEnumerable获胜。

In general you should: 一般来说,你应该:

  • Use a Dictionary when doing the lookup more than once against the same dataset. 在对同一数据集进行多次查找时使用Dictionary
  • Use an IEnumerable when doing the lookup one. 在执行查找时使用IEnumerable
  • Use an IEnumerable when the data-set could be too large to fit into memory. 当数据集太大而无法放入内存时,请使用IEnumerable
    • Keep in mind a SQL table can be used like a Dictionary , so you could use that to offset the memory pressure. 请记住,SQL表可以像Dictionary一样使用,因此您可以使用它来抵消内存压力。

Further Considerations 进一步的考虑

Dictionary s use GetHashCode() to organise their internal state. Dictionary使用GetHashCode()来组织其内部状态。 The performance of a Dictionary is strongly-related to the hash code in two ways. Dictionary的性能以两种方式与哈希码密切相关。

  • Poorly performing GetHashCode() - results in overhead every time an item is added, looked up, or deleted. 性能不佳的GetHashCode() - 每次添加,查找或删除项目时都会产生开销。
  • Low quality hash codes - results in the dictionary not having O(1) lookup performance. 低质量哈希码 - 导致字典不具有O(1)查找性能。

Most built-in .Net types (especially the value types) have very good hashing algorithms. 大多数内置的.Net类型(尤其是值类型)都有非常好的散列算法。 However, with list-like types (eg string) GetHashCode() has O(n) performance - because it needs to iterate over the whole string. 但是,对于类似列表的类型(例如字符串), GetHashCode()具有O(n)性能 - 因为它需要迭代整个字符串。 Thus you dictionary's performance can really be seen as (where M is the big-oh for an efficient GetHashCode() ): O(1) + M . 因此,字典的性能可以被视为(其中M是有效GetHashCode()的大哦): O(1) + M

It depends.... 这取决于....

How long is the IEnumerable? IEnumerable有多长?

Does accessing the IEnumerable cause database access? 访问IEnumerable会导致数据库访问吗?

How often is it accessed? 它多久访问一次?

The best thing to do would be to experiment and profile. 最好的办法是实验和剖析。

如果您经常通过某个键搜索集合中的元素 - 定义字典会更快,因为或者基于散列的集合和搜索时间更快,否则如果您不通过集合搜索很多 - 转换不是必要的,因为转换的时间可能比你在集合中的一两次搜索更大,

IMHO: you need to measure this on your environment with representative data. 恕我直言:您需要使用代表性数据在您的环境中进行测量。 In such cases I just write a quick console app that measures the time of the code execution. 在这种情况下,我只需编写一个快速控制台应用程序来衡量代码执行的时间。 To have a better measure you need to execute the same code multiple times I guess. 为了获得更好的衡量标准,我需要多次执行相同的代码。

ADD: 加:

It also depents on the application you develop. 它还取决于您开发的应用程序。 Usually you gain more in optimizing other places (avoiding networkroundrips, caching etc.) in that time and effort. 通常,您会在这个时间和精力上获得更多优化其他地方(避免网络环境,缓存等)。

I'll add that you haven't told us what happens every time you "rewind" your IEnumerable<> . 我要补充一点,你没有告诉我们每次你“回放”你的IEnumerable<>时会发生什么。 Is it directly backed by a data collection? 它是否直接由数据收集支持? (for example a List<> ) or is it calculated "on the fly"? (例如List<> )还是“动态”计算? If it's the first, and for small collections, enumerating them to find the wanted element is faster (a Dictionary for 3/4 elements is useless. If you want I can build some benchmark to find the breaking point). 如果它是第一个,对于小集合,枚举它们以找到想要的元素更快(3/4元素的字典是没用的。如果你想我可以建立一些基准来找到断点)。 If it's the second then you have to consider if "caching" the IEnumerable<> in a collection is a good idea. 如果它是第二个,那么你必须考虑是否“缓存”集合中的IEnumerable<>是一个好主意。 If it's, then you can choose between a List<> or a Dictionary<> , and we return to point 1. Is the IEnumerable small or big? 如果是,那么你可以在List<>Dictionary<>之间进行选择,然后我们返回到第1点IEnumerable小还是大? And there is a third problem: if the collection isn't backed, but it's too big for memory, then clearly you can't put it in a Dictionary<> . 还有第三个问题:如果集合没有备份,但它对于内存来说太大了,那么显然你不能把它放在Dictionary<> Then perhaps it's time to make the SQL work for you :-) 那么也许是时候让SQL为你工作了:-)

I'll add that "failures" have their cost: in a List<> if you try to find an element that doesn't exist, the cost is O(n) , while in a Dictionary<> the cost is still O(1) . 我将补充说“失败”有它们的成本:如果你试图找到一个不存在的元素,在List<> ,成本是O(n) ,而在Dictionary<> ,成本仍然是O(1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM