简体   繁体   English

System.Collections.Generic.Dictionary =终极表现?

[英]System.Collections.Generic.Dictionary = Ultimate performance?

I'm writing a Haxe C# target, and I've been studying performance differences for Haxe's std library so we can provide the best performance possible through its cross platform code. 我正在编写一个Haxe C#目标,我一直在研究Haxe的std库的性能差异,因此我们可以通过其跨平台代码提供最佳性能。

One very good example is for the hash table code. 一个非常好的例子是哈希表代码。 I was a little reluctant about using .NET's Dictionary, as it seems bulky (structs for key/value pairs can take up a huge amount of memory because of memory alignment issues, besides from unnecessary information held by it), and since on the std library there is no such thing as an object hash, I really thought I could squeeze a little performance by not having to call GetHashCode, and inline it all along. 我有点不情愿使用.NET的字典,因为它看起来很笨重(键/值对的结构可能占用大量的内存,因为内存对齐问题,除了它所持有的不必要的信息),并且因为在std上库没有对象哈希这样的东西,我真的以为我可以通过不必调用GetHashCode来压缩一点性能,并一直内联它。

Also it's clear that the Dictionary implementation uses a linked list to deal with collisions, which is far from ideal. 同样很明显,Dictionary实现使用链表来处理冲突,这远非理想。

So we started to implement our own solution, starting with IntHash (Dictionary) We first implemented Hopscotch hashing , but it really didn't turn out very well, but it was kind of obvious that it wouldn't support very well huge hash tables, since H is normally a machine word, and as H / Length increases, the poorer the performance. 所以我们开始实现我们自己的解决方案,从IntHash(Dictionary)开始我们首先实现了Hopscotch哈希 ,但实际上并没有很好,但很明显它不支持非常好的哈希表,因为H通常是机器字,并且随着H /长度的增加,性能越差。

We then jumped to implement a khash -inspired algorithm. 然后我们跳转到实现khash -inspired算法。 This one had much potential, as its benchmarks are impressive, and it handles collisions on the same array. 这个具有很大的潜力,因为它的基准测试令人印象深刻,并且它处理同一阵列上的冲突。 It had also some great things, like resizing without needing twice as memory as we would. 它也有一些很棒的东西,比如调整大小而不需要像我们那样需要两倍的内存。

The benchmarks were disappointing. 基准令人失望。 Of course, there is no need to say that memory usage was much lower on our implementation than Dictionary's. 当然,没有必要说我们的实现中的内存使用量远低于Dictionary的内存使用率。 But I was hoping to get a nice performance boost also, but that was not the case, unfortunately. 但我希望也能获得不错的性能提升,但不幸的是,情况并非如此。 It wasn't too far below - less than an order of magnitude - but for both sets and gets, .NET's implementation still performed better. 它不是太低 - 不到一个数量级 - 但对于两个集合和获取,.NET的实现仍然表现更好。

So my question is: is that the best we have for C#? 所以我的问题是:C#是我们最好的吗? I tried looking for any custom solution, and it seems there is almost none. 我试着寻找任何自定义解决方案,似乎几乎没有。 There is that C5 generic collection, but the code is so cluttered I did not even test. 有C5通用集合,但代码是如此混乱,我甚至没有测试。 And I found no benchmark also. 我也找不到基准。

So... Is that it? 那么......是吗? Should I just wrap around Dictionary<> ? 我应该绕着Dictionary<>吗?

I've found that the .NET Dictionary performs well, if not exceptionally well, in most situations. 在大多数情况下,我发现.NET Dictionary表现良好,如果不是特别好的话。 It's a good general purpose implementation. 这是一个很好的通用实现。 The problem I most often run into is the 2-gigabyte limit. 我经常遇到的问题是2千兆字节的限制。 On a 64-bit system, you can't add more than about 89.5 million items to a dictionary (when the key is an integer or a reference, and the value is a reference). 在64位系统上,您不能向字典添加超过约8950万个项目(当键是整数或引用时,该值是引用)。 Dictionary overhead appears to be 24 bytes per item. 字典开销似乎是每个项目24个字节。

That limit makes itself known in a very odd way. 这种限制使自己以一种非常奇怪的方式出现。 The Dictionary seems to grow by doubling--when it gets full, it increases capacity to the next prime number that's at least double the current size. Dictionary似乎通过加倍而增长 - 当它变满时,它会增加到下一个素数的容量,该素数至少是当前大小的两倍。 Because of that, the dictionary will grow to about 47 million and then throw an exception because when it tries to double (to 94 million), the memory allocation fails (due to the 2 gigabyte limit). 因此,字典将增长到大约4700万,然后抛出异常,因为当它试图加倍(到9400万)时,内存分配失败(由于2千兆字节的限制)。 I get around the problem by pre-allocating the Dictionary (ie call the constructor that lets you specify the capacity). 我通过预先分配Dictionary来解决问题(即调用允许您指定容量的构造函数)。 That also speeds up populating the dictionary because it never has to grow, which entails allocating a new array and re-hashing everything. 这也加快了填充字典的速度,因为它永远不会增长,这需要分配一个新的数组并重新散列所有内容。

What makes you say that Dictionary uses a linked list for collision resolution? 是什么让你说Dictionary使用链表进行冲突解决? I'm pretty sure it uses open addressing, but I don't know how it does the probes. 我很确定它使用开放寻址,但我不知道它是如何进行探测的。 I guess if it does linear probing, then the effect is similar to what you'd get with a linked list. 我想如果它进行线性探测,那么效果类似于链接列表的效果。

We wrote our own BigDictionary class to get past the 2-gigabyte limit and found that a straightforward open addressing scheme with linear probing gives reasonably good performance. 我们编写了自己的BigDictionary类来超过2 GB的限制,发现一个简单的开放式寻址方案,线性探测可以提供相当好的性能。 It's not as fast as Dictionary , but it can handle hundreds of millions of items (billions if I had the memory). 它没有Dictionary那么快,但它可以处理数以亿计的项目(如果我有内存,可以处理数十亿)。

That said, you should be able to write a faster task-specific hash table that outperforms the .NET Dictionary in some situations. 也就是说,在某些情况下,您应该能够编写一个更快的特定于任务的哈希表,该表优于.NET Dictionary。 But for a general purpose hash table I think you'll be hard pressed to do better than what the BCL provides. 但对于通用哈希表,我认为你很难比BCL提供的更好。

There are many things to consider in desigining a "better" hash table. 在设计“更好”的哈希表时需要考虑很多事情。 One of the reasons that the custom approaches you tried were slower or no better than the .NET Dictionary is that very often the performance of a hash table is very dependant on: 您尝试的自定义方法的原因之一是比.NET字典更慢或没有更好,因为哈希表的性能通常非常依赖于:

  • The data being hashed 正在散列的数据
  • The performance of the hash function 哈希函数的性能
  • The load factor of the table 表的加载因子
  • The number of collisions vs non-collisions 碰撞次数与非碰撞次数
  • The algorithm for collision resolution 用于冲突解决的算法
  • The amount of data in the table and how it's stored (by pointer/reference or directly within the buckets) 表中的数据量以及如何存储(通过指针/引用或直接在存储桶中)
  • The access patterns to the data 数据的访问模式
  • The number of insertions/deletions vs retrievals 插入/删除与检索的数量
  • The need for resizing in a closed hashing/open addressing implementation 需要在封闭的散列/开放寻址实现中调整大小
  • and many other factors... 和许多其他因素......

With so many things to tweak and tune, it is difficult, without a significant amount of effort to come up with a general high performance (time and speed) hash table. 有这么多东西要调整和调整,很难,没有大量的努力来提出一般的高性能(时间和速度)哈希表。 That is why, if you are going to try to create a custom hash table instead of one built into a standard library (such as .NET) be ready to spend countless hours and be aware that your finely tuned implementation may be only tuned for the specific type and amount of data you are hashing. 这就是为什么,如果你打算尝试创建一个自定义哈希表而不是一个内置到标准库(如.NET)中的哈希表,那就准备花费无数个小时,并注意你的精心调整的实现可能只针对您正在散列的特定类型和数据量。

Therefore, no, the .NET Dictionary is not the ultimate hash table for any specific purpose. 因此,不,.NET Dictionary不是用于任何特定目的的最终哈希表。 But, given the frequency of dictionary use, I am sure that the Microsoft BCL (Base Class Library) team performed a huge amount of profiling to choose the approach they chose for the general case. 但是,考虑到字典使用的频率,我确信Microsoft BCL(基类库)团队执行了大量的分析,以选择他们为一般情况选择的方法。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 System.Collections.Generic.Dictionary foreach顺序 - System.Collections.Generic.Dictionary foreach order 将System.Collections.Generic.Dictionary转换为XDocument - Turn a System.Collections.Generic.Dictionary into an XDocument 反序列化 System.Collections.Generic.Dictionary - deserializing System.Collections.Generic.Dictionary &#39;System.Collections.Generic.Dictionary的最佳重载方法匹配 <int,System.Collections.Generic.Dictionary<string,int> &gt; .Dictionary(int)&#39; - The best overloaded method match for 'System.Collections.Generic.Dictionary<int,System.Collections.Generic.Dictionary<string,int>>.Dictionary(int)' 通过for语句循环通过System.Collections.Generic.Dictionary - Loop through System.Collections.Generic.Dictionary via for statement LINQ to Entities无法识别方法&#39;System.Collections.Generic.Dictionary` - LINQ to Entities does not recognize the method 'System.Collections.Generic.Dictionary` 如何在 Realm-dotnet 中存储 System.Collections.Generic.Dictionary - How to store a System.Collections.Generic.Dictionary in realm-dotnet 枚举System.Collections.Generic.Dictionary中的键<string,string> - enumerate keys in in a System.Collections.Generic.Dictionary<string,string> System.Collections.Generic.Dictionary`Add`vs set&#39;Item` - System.Collections.Generic.Dictionary `Add` vs set `Item` 无法将类型字符串隐式转换为System.Collections.Generic.Dictionary <string,System.Collections.Generic.Dictionary><string,object> &gt; - Cannot implicitly convert type string to System.Collections.Generic.Dictionary<string,System.Collections.Generic.Dictionary><string,object>>
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM