简体繁体 English

快速排序与缓存有什么关系？

[英]How is quicksort is related to cache?

原文 2012-02-25 14:13:32 0 4 algorithm/ caching/ sorting/ data-structures

I have seen many places say quicksort is good because it fits to cache-related stuff, such as said in wiki我看到很多地方说快速排序很好，因为它适合缓存相关的东西，比如 wiki 中说的

Additionally, quicksort's sequential and localized memory references work well with a cache此外，快速排序的顺序和本地化 memory 引用与缓存配合使用效果很好

http://en.wikipedia.org/wiki/Quicksort http://en.wikipedia.org/wiki/Quicksort

Could anyone give me some insight about this claim?谁能给我一些关于这个说法的见解？ How is quicksort related to cache?快速排序与缓存有什么关系？ Normally what means that cache in the statement?通常在语句中缓存是什么意思？ Why quicksort is better for a cache?为什么快速排序更适合缓存？

Thanks谢谢

4 个解决方案

quicksort changes the array inplace - in the array it is working on [unlike merge sort, for instance - which creates a different array for it]. quicksort将数组更改为inplace - 在它正在处理的数组中[例如，与合并排序不同 - 为它创建一个不同的数组]。 Thus, it applies the principle of locality of reference . 因此，它适用了参考地点的原则。

Cache benefits from multiple accesses to the same place in the memory, since only the first access needs to be actually taken from the memory - the rest of the accesses are taken from cache, which is much faster the access to memory. 缓存受益于对内存中相同位置的多次访问，因为只需要实际从内存中获取第一次访问 - 其余访问都来自缓存，这对内存的访问速度要快得多。

Merge sort for instance - needs much more memory [RAM] accesses - since every accessory array you create - is accessing the RAM again. 例如，合并排序 - 需要更多的内存[RAM]访问 - 因为您创建的每个附件阵列 - 再次访问RAM。

Trees are even worse - since 2 sequential accesses in a tree are not likely to be close to each other. 树木更糟糕 - 因为树中的2次连续访问不太可能彼此接近。 [Cache is filled in blocks, so for sequential accesses - only the first byte in the block is a "miss" and the others are a "hit"]. [缓存填充块，因此对于顺序访问 - 只有块中的第一个字节是“未命中”而其他字节是“命中”]。

What goes into the cache is determined by algorithms that pretty much guess at what you're going to use soon based on what you're currently requesting. 缓存中的内容取决于算法，这些算法很快会根据您当前请求的内容猜测您将要使用的内容。 This usually means blocks of memory that are close to each other, such as arrays. 这通常意味着彼此靠近的内存块，例如数组。

After a few iterations, quicksort will be working with blocks that fit completely into the cache, and this substantially increases performance. 经过几次迭代后，quicksort将使用完全适合缓存的块，这大大提高了性能。 (Compare with, say, select sort, which may be accessing memory locations that are far apart with most operations.) （比如说，选择排序，可能正在访问与大多数操作相距很远的内存位置。）

Quicksort is an in-place sorting algorithm. Quicksort是一种就地排序算法。 It moves elements to the left and right of the pivot via using swaps. 它通过使用交换将元素移动到枢轴的左侧和右侧。 Each time a swap occurs, it is likely that the cache line is loaded and subsequent swap will occur from the same cache line. 每次发生交换时，很可能加载了缓存行，并且后续交换将从同一缓存行发生。

From an algorithmic point of view the time complexity is usually just the number of memory accesses.从算法的角度来看，时间复杂度通常只是 memory 次访问的次数。 From this point of view, quicksort is good in average O(n log n), but not the best algortihm.从这个角度来看，快速排序在平均 O(n log n) 中是好的，但不是最好的算法。 First, because it's worst complexity is O(n2) and does meet this cases in real life (typically for reverse-sorted or constant inputs) when some others are O(n log n) in the worst case.首先，因为它的最坏复杂度是 O(n2) 并且在现实生活中确实遇到了这种情况（通常是反向排序或常量输入），而其他一些在最坏情况下是 O(n log n)。

What makes quicksort a good algorithm is that on real computers, not all memory accesses take the same time.使快速排序成为一个好的算法的原因是，在真实的计算机上，并非所有 memory 次访问都需要相同的时间。

The main memory, SDRAM has a latency that seems very long from the CPU's point of view (typically hundreds of CPU cycles).主要的 memory，SDRAM 的延迟从 CPU 的角度来看似乎很长（通常是数百个 CPU 周期）。 Fortunately, large portions of the memory can be requested and put in a smaller, quicker memory: the cache (CPUs often have multiple layers of cache but I'll refer to all of these as "the cache").幸运的是，可以请求 memory 的大部分并将其放入更小、更快的 memory：缓存（CPU 通常具有多层缓存，但我将所有这些都称为“缓存”）。 For this reason a computer runs faster when it works on data that is already (or still) in the cache.出于这个原因，当计算机处理已经（或仍然）在缓存中的数据时，它运行得更快。 When it fetches new memory region, the CPU has to wait for the main memory to answer, which makes a notable performance hit, called a cache miss .当它获取新的 memory 区域时，CPU 必须等待主 memory 响应，这会造成显着的性能损失，称为缓存未命中。

So an important factor for sorting large arrays quickly is memory accesses pattern.因此，快速排序大 arrays 的一个重要因素是 memory 访问模式。 For this, quicksort is really good and generate very few cache misses.为此，快速排序非常好并且产生很少的缓存未命中。

Why?为什么？ Because it scans the array sequentially (and repeats on smaller segments).因为它按顺序扫描数组（并在较小的段上重复）。 The most of its memory accesses take place in the cache.其 memory 次访问中的大部分发生在缓存中。

Heap sort on the other end needs to maintain a search tree-like structure and has a more "random" memory access pattern.另一端的堆排序需要维护一个类似搜索树的结构，并且具有更“随机”的 memory 访问模式。

Even merge sort which is very efficient on cache still needs additional memory accesses, because it cannot work in place: either because it needs to be recursive keep temporary data in the stack or it needs linked lists (which means pointers and indirect memory accesses and then cache misses).即使在缓存上非常有效的合并排序仍然需要额外的 memory 次访问，因为它不能就地工作：要么因为它需要递归将临时数据保留在堆栈中，要么它需要链表（这意味着指针和间接 memory 次访问然后缓存未命中）。