简体繁体 English

C语言中memset函数的复杂性

[英]Complexity of the memset function in C

原文 2012-07-26 06:02:11 7 4 c/ complexity-theory/ big-o/ memset

我和一些朋友讨论了一段代码，我们讨论了在C中使用memset函数，如果我们初始化一个大小为N的数组，这个函数的Big-O表示法的顺序是什么？

4 个解决方案

On a system where you have direct access to the page tables and they're stored in a hierarchical way, memset could be implemented in O(log n) by replacing the whole virtual address mapping with copy-on-write references to a single page filled with the given byte value. 在您可以直接访问页表并且以分层方式存储的系统上，通过将整个虚拟地址映射替换为对单个页面的写时复制引用，可以在O(log n)实现memset填充给定的字节值。 Note however that if you're going to do any future modifications to the object, the normal O(n) cost of memset would just be deferred to the page fault to instantiate separate copies of the pages when they're modified. 但请注意，如果您将来对该对象进行任何修改，则memset的正常O(n)成本将被推迟到页面错误，以在修改页面时实例化单独的页面副本。

You asked about complexity, but you probably intended to ask about performance. 您询问了复杂性，但您可能打算询问性能。

Complexity, referred to with the notation O(n), is a concept relating to how the number of operations in an algorithm is compelled to grow as the problem size grows. 用符号O（n）表示的复杂性是与算法中的操作数量如何随着问题大小增长而被迫增长有关的概念。 O(n) means that some number of steps proportional to the input size must be performed. O（n）表示必须执行与输入大小成比例的一些步骤。 It does not say what that proportion is. 它没有说明这个比例是多少。 memset is O(n). memset是O（n）。 O(n ² ) means some number of steps proportional to n ² must be performed. O（n ² ）表示必须执行与n ²成比例的一些步骤。 memset is not O(n ² ) because setting 2n bytes takes only twice as much work as n bytes, not four times as much work, in general. memset不是O（n ² ），因为设置2n个字节的工作量只是n个字节的两倍，而不是工作量的四倍。

You are likely more interested in the performance of memset, because the library version of memset performs much more quickly than a C version you might write. 您可能对memset的性能更感兴趣，因为memset的库版本比您可能编写的C版本执行得更快。

The library version performs much more quickly because it uses specialized instructions. 库版本执行速度更快，因为它使用专门的指令。 Most common modern processors have instructions that allow them to write 16 bytes to memory in one instruction. 最常见的现代处理器具有允许它们在一条指令中将16字节写入存储器的指令。 The library implementors write critical functions like memset in assembly language or something close to it, so they have access to all these instructions. 库实现者用汇编语言或接近它的东西编写像memset这样的关键函数，因此他们可以访问所有这些指令。

When you write in C, it is difficult for the compiler to take advantage of these instructions. 用C语言编写时，编译器很难利用这些指令。 For example, the pointer to the memory you are setting might not be aligned to a multiple of 16 bytes. 例如，指向您正在设置的内存的指针可能不会与16个字节的倍数对齐。 The memset authors will write code that tests the pointer and branches to different code for each case, with the goal of setting some bytes individually and then having a pointer that is aligned, so they can use the fast instructions that store 16-bytes at a time. memset作者将编写测试指针的代码，并为每种情况分支到不同的代码，目标是单独设置一些字节，然后使用一个对齐的指针，这样他们就可以使用存储16字节的快速指令。时间。 This is only one of a number of complications that the library implementors deal with when writing routines like memset. 这只是库编写器在编写memset等例程时要处理的许多复杂问题之一。

Because of those complications, the compiler cannot easily take your C implementation of memset and turn it into the fast code that experts write. 由于这些复杂性，编译器无法轻松采用memset的C实现并将其转换为专家编写的快速代码。 When the compiler sees, in C code, a loop that writes one byte at a time, it typically generates assembly language that writes one byte at a time. 当编译器在C代码中看到一次写入一个字节的循环时，它通常会生成一次写入一个字节的汇编语言。 Optimizers are getting smarter, but the complications limit how much they are allowed to do and how much they can do without generating a lot of code to handle cases that might rarely occur. 优化器变得越来越聪明，但复杂性限制了它们允许执行的程度以及它们可以执行多少操作而无需生成大量代码来处理可能很少发生的情况。

The complexity is O(n). 复杂度为O（n）。 This is basic stuff. 这是基本的东西。

Some C libraries provide vectorised versions of memset() . 一些C库提供了memset()矢量化版本。 Unless your compiler does automatic vectorisation and loop unrolling, your for loop will be way slower than a vectorised memset() . 除非您的编译器执行自动矢量化和循环展开，否则for循环将比矢量化memset()慢。 Vectorised or not, memset() is limited by the memory bandwidth and the minimum time is proportional to the array size divided by the memory bandwidth, ie it is an O(n) operation as memory bandwidth is constant. 矢量化与否， memset()受存储器带宽的限制，最小时间与数组大小除以存储器带宽成正比，即当存储器带宽恒定时，它是O（n）操作。

On NUMA machines memsetting very large arrays could be threaded to achieve speedup of the order of the number of NUMA nodes. 在NUMA机器上，可以对非常大的阵列进行线程化，以实现NUMA节点数量级的加速。 See this answer for some benchmarks. 有关基准测试，请参阅此答案。