简体   繁体   中英

Complexity of the memset function in C

我和一些朋友讨论了一段代码,我们讨论了在C中使用memset函数,如果我们初始化一个大小为N的数组,这个函数的Big-O表示法的顺序是什么?

On a system where you have direct access to the page tables and they're stored in a hierarchical way, memset could be implemented in O(log n) by replacing the whole virtual address mapping with copy-on-write references to a single page filled with the given byte value. Note however that if you're going to do any future modifications to the object, the normal O(n) cost of memset would just be deferred to the page fault to instantiate separate copies of the pages when they're modified.

You asked about complexity, but you probably intended to ask about performance.

Complexity, referred to with the notation O(n), is a concept relating to how the number of operations in an algorithm is compelled to grow as the problem size grows. O(n) means that some number of steps proportional to the input size must be performed. It does not say what that proportion is. memset is O(n). O(n 2 ) means some number of steps proportional to n 2 must be performed. memset is not O(n 2 ) because setting 2n bytes takes only twice as much work as n bytes, not four times as much work, in general.

You are likely more interested in the performance of memset, because the library version of memset performs much more quickly than a C version you might write.

The library version performs much more quickly because it uses specialized instructions. Most common modern processors have instructions that allow them to write 16 bytes to memory in one instruction. The library implementors write critical functions like memset in assembly language or something close to it, so they have access to all these instructions.

When you write in C, it is difficult for the compiler to take advantage of these instructions. For example, the pointer to the memory you are setting might not be aligned to a multiple of 16 bytes. The memset authors will write code that tests the pointer and branches to different code for each case, with the goal of setting some bytes individually and then having a pointer that is aligned, so they can use the fast instructions that store 16-bytes at a time. This is only one of a number of complications that the library implementors deal with when writing routines like memset.

Because of those complications, the compiler cannot easily take your C implementation of memset and turn it into the fast code that experts write. When the compiler sees, in C code, a loop that writes one byte at a time, it typically generates assembly language that writes one byte at a time. Optimizers are getting smarter, but the complications limit how much they are allowed to do and how much they can do without generating a lot of code to handle cases that might rarely occur.

The complexity is O(n). This is basic stuff.

Some C libraries provide vectorised versions of memset() . Unless your compiler does automatic vectorisation and loop unrolling, your for loop will be way slower than a vectorised memset() . Vectorised or not, memset() is limited by the memory bandwidth and the minimum time is proportional to the array size divided by the memory bandwidth, ie it is an O(n) operation as memory bandwidth is constant.

On NUMA machines memsetting very large arrays could be threaded to achieve speedup of the order of the number of NUMA nodes. See this answer for some benchmarks.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM