简体   繁体   English

为什么我的程序在增加哈希表的大小时变慢

[英]Why does my program slow down on increasing the size of my hashtable

I am using murmur hash to store 150,000 words in a hashtable I am using linear probing to resolve collisions in my program. 我正在使用杂项哈希将150,000个单词存储在哈希表中,正在使用线性探测来解决程序中的冲突。 I thought that if the size of my hashtable is large, then there will be a large number of free spaces, and I won't have to probe for a long time. 我以为,如果哈希表的大小很大,那么将有大量的可用空间,并且我将不必长时间进行探查。 But something strange happens. 但是有些奇怪的事情发生了。 I got the fastest running time when the size of the hashtable was 250,000. 当哈希表的大小为250,000时,我的运行时间最快。 After that the running time increases. 之后,运行时间增加。 Why does this happen? 为什么会这样?

While Robert covers the general issue (Locality) the issue is probably Spatial Locality . 尽管Robert涵盖了一般性问题(局部性),但问题可能是空间局部性

When you have a smaller hash table, it fits into cache. 如果哈希表较小,则适合缓存。 When you have a very large hash table, each lookup runs a high risk of page fault. 当您有一个非常大的哈希表时,每个查找都会冒页面错误的高风险。 Should you page fault, then your operating system needs to pause the execution until the memory management unit can copy blocks from slower access memory to the caches that are closer to the CPU. 如果出现页面错误,则操作系统需要暂停执行,直到内存管理单元可以将块从较慢访问的内存复制到更靠近CPU的缓存中为止。

In extreme cases, the slower access memory might even be an on-disk resource provided by the operating system. 在极端情况下,访问速度较慢的内存甚至可能是操作系统提供的磁盘资源。

"Hash tables in general exhibit poor locality of reference—that is, the data to be accessed is distributed seemingly at random in memory. Because hash tables cause access patterns that jump around, this can trigger microprocessor cache misses that cause long delays. Compact data structures such as arrays searched with linear search may be faster, if the table is relatively small and keys are compact. The optimal performance point varies from system to system." “哈希表通常显示的参考位置很差,也就是说,要访问的数据看似随机分布在内存中。由于哈希表会导致访问模式跳跃,因此会触发微处理器缓存未命中,从而导致长时间的延迟。紧凑的数据如果表较小且键紧凑,则使用线性搜索搜索的数组等结构可能会更快。最佳性能点因系统而异。” - https://en.wikipedia.org/wiki/Hash_table -https://zh.wikipedia.org/wiki/哈希表

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM