简体   繁体   English

C - 排序结构的指针数组比直接排序结构更慢(qsort)

[英]C - Is sorting an array of pointers of structs slower than sorting the structs directly (qsort)

I am sorting millions of structs organzied in an array with the qsort-function of the standard c library. 我正在使用标准c库的qsort函数对数组中数百万个结构进行排序。 I tried to optimize the performance by creating an array of pointers of the struct with the same length. 我尝试通过创建具有相同长度的struct的指针数组来优化性能。 In contrast to my expectations the execution time of the second variant is slower: 与我的期望相反,第二个变体的执行时间较慢:

qsort an array of structs: 199s qsort an array of pointers of structs: 204 qsort结构数组:199s qsort结构指针数组:204

I expected that the time for swapping pointer blocks in the memory would be faster than moving structs (size 576). 我期望在内存中交换指针块的时间比移动结构更快(大小为576)。 May I have any performance leaks or is this a known behaviour? 我可能有任何性能泄漏或这是一个已知的行为?

There are other issues here. 这里还有其他问题。

By creating the array of pointers you are fragmenting the memory. 通过创建指针数组,您可以分割内存。 Algorithms in the standard libraries are designed to optimise the sorting of contiguous arrays, so by doing this you are probably missing the cache far more often than if you just had a bigger array. 标准库中的算法旨在优化连续数组的排序,因此,通过这样做,您可能比使用更大的数组更频繁地丢失缓存。

Quicksort in particular is quite good for locality of reference, as you halve the sample size and so eventually you are sorting subsets of the original array in chunks that can completely fit into your cache. Quicksort特别适用于引用的局部性,因为您将样本大小减半,因此最终您将以块的形式对原始数组的子集进行排序,这些块可以完全适合您的缓存。

As a general rule, cache misses are an order of magnitude slower than hits. 作为一般规则,缓存未命中比命中慢一个数量级。 As a result this time delay could be significant enough to make up for the speed up you get by not copying all the bytes. 因此,这个时间延迟可能非常重要,可以通过不复制所有字节来弥补您获得的速度。

The way quicksort works, it gradually re-organizes the array by placing neighboring elements closer together. 快速排序的工作方式,它通过将相邻元素放在一起逐渐重新组织阵列。 This allows the data cache to work more efficiently the closer the algorithm gets towards the final result. 这使得数据高速缓存能够在算法越接近最终结果时更有效地工作。

If you convert to an array of pointers, then the data accesses will likely slow down, since the structures maintain their "unsorted" ordering, while their pointers are getting sorted. 如果转换为指针数组,那么数据访问可能会减慢,因为结构保持其“未排序”排序,而它们的指针正在排序。 But, comparing the structures requires following the pointers to their "unsorted" instances, which might cause data cache misses. 但是,比较结构需要遵循指向其“未排序”实例的指针,这可能会导致数据缓存未命中。

To achieve something like what you desire, you can create an indexing structure to your data. 为了实现您想要的东西,您可以为数据创建索引结构。 The indexing structure would hold the sorting key (or a copy of it). 索引结构将保存排序键(或其副本)。

struct index_type {
    key_type key;
    data_type *data;
};

And now, you would sort an array of index_type instead of an array of pointers to data_type . 现在,您将对index_type数组进行排序,而不是对data_type指针数组。 Since the key is stored in the array itself, you avoid the issue of following pointers to your "unsorted" structures. 由于密钥存储在数组本身中,因此可以避免跟随指向“未排序”结构的指针。

I did a quick sanity check using this structure (which has size 576 when int is 32-bit) 我使用这种结构进行了快速的健全性检查(当int为32位时,其大小为576)

struct test
{
    int value;
    char data[572];
};

I initialized a dynamically allocated array of 1 million structs with this code 我使用此代码初始化了一个包含100万个结构的动态分配数组

for ( int i = 0; i < count; i++ )
{
    array[i].value = rand();
    for ( int j = 0; j < 572; j++ )
        array[i].data[j] = rand();
}

And I sorted the array with this code 我用这段代码对数组进行了排序

int compare( const void *ptr1, const void *ptr2 )
{
    struct test *tptr1 = (struct test *)ptr1;
    struct test *tptr2 = (struct test *)ptr2;
    return tptr1->value - tptr2->value;
}

int main( void )
{
    int count = 1000000;
    ...
    qsort( array, count, sizeof(struct test), compare );
    ...
}

The time to initialize the array was 4.3 seconds, and the time to sort the array was 0.9 seconds. 初始化阵列的时间是4.3秒,对阵列进行排序的时间是0.9秒。

I then modified the code to create an array of pointers to the structures, and sorted the pointer array. 然后我修改了代码以创建一个指向结构的指针数组,并对指针数组进行了排序。 The initialization time was still 4.3 seconds (most of the initialization time is due to calling rand() 500 million times). 初始化时间仍为4.3秒(大部分初始化时间是由于调用rand() 5亿次)。 Sorting the pointer array took 0.4 seconds. 对指针数组进行排序需要0.4秒。 Sorting the pointer array was more than twice as fast as sorting the structure array directly. 对指针数组进行排序的速度是直接对结构数组进行排序的两倍多。

So my conclusion is that your code has some massive inefficiencies that have nothing to do with qsort . 所以我的结论是你的代码有一些与qsort无关的大量低效率。

Which is faster will depend, in general, on the size of the structure. 一般来说,哪个更快将取决于结构的大小。 For structures that are the same size as a pointer, then it should be obvious that sorting the structures will be faster than sorting pointers to the structures. 对于与指针大小相同的结构,显然对结构进行排序比排序指向结构的指针更快。 As the structure size increases, a point will be reached where the reverse is true (imagine sorting an array of 1 MB structures: you'd spend most of your time in memcopy()). 随着结构大小的增加,将达到反转为真的点(想象排序1 MB结构的数组:您将大部分时间花在memcopy()上)。 Where, exactly, that point lies will depend on things outside the control of the code (cache structure, cache size, etc.). 确切地说,这一点取决于代码控制之外的事物(缓存结构,缓存大小等)。 If this is important to you, then you'd best experiment and measure. 如果这对您很重要,那么您最好进行实验和测量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM