为什么 memcmp 比 for 循环检查快这么多？

Question

Why is memcmp(a, b, size) so much faster than:为什么memcmp(a, b, size)比：

for(i = 0; i < nelements; i++) {
    if a[i] != b[i] return 0;
}
return 1;

Is memcmp a CPU instruction or something? memcmp 是 CPU 指令还是什么？ It must be pretty deep because I got a massive speedup using memcmp over the loop.它一定很深，因为我在循环中使用memcmp获得了巨大的加速。

Answer 1

memcmp is often implemented in assembly to take advantage of a number of architecture-specific features, which can make it much faster than a simple loop in C. memcmp经常在装配实现采取若干的架构的特定功能，它可以使远快于在C.一个简单的循环的优点

As a "builtin"作为“内置”

GCC supports memcmp (as well as a ton of other functions) as builtins . GCC 支持memcmp （以及大量其他函数）作为内置函数。 In some versions / configurations of GCC, a call to memcmp will be recognized as __builtin_memcmp .在 GCC 的某些版本/配置中，对memcmp的调用将被识别为__builtin_memcmp 。 Instead of emitting a call to the memcmp library function, GCC will emit a handful of instructions to act as an optimized inline version of the function. GCC 不会发出对memcmp库函数的call ，而是发出一些指令来充当该函数的优化内联版本。

On x86, this leverages the use of the cmpsb instruction, which compares a string of bytes at one memory location to another.在 x86 上，这利用了cmpsb指令的使用，该指令将一个内存位置的字节串与另一个进行比较。 This is coupled with the repe prefix, so the strings are compared until they are no longer equal, or a count is exhausted.这与repe前缀相结合，因此将比较字符串，直到它们不再相等，或者计数用完为止。 (Exactly what memcmp does). （正是memcmp所做的）。

Given the following code:鉴于以下代码：

int test(const void* s1, const void* s2, int count)
{
    return memcmp(s1, s2, count) == 0;
}

gcc version 3.4.4 on Cygwin generates the following assembly: Cygwin 上的gcc version 3.4.4生成以下程序集：

; (prologue)
mov     esi, [ebp+arg_0]    ; Move first pointer to esi
mov     edi, [ebp+arg_4]    ; Move second pointer to edi
mov     ecx, [ebp+arg_8]    ; Move length to ecx

cld                         ; Clear DF, the direction flag, so comparisons happen
                            ; at increasing addresses
cmp     ecx, ecx            ; Special case: If length parameter to memcmp is
                            ; zero, don't compare any bytes.
repe cmpsb                  ; Compare bytes at DS:ESI and ES:EDI, setting flags
                            ; Repeat this while equal ZF is set
setz    al                  ; Set al (return value) to 1 if ZF is still set
                            ; (all bytes were equal).
; (epilogue)

Reference:参考：

cmpsb instruction cmpsb指令

As a library function作为库函数

Highly-optimized versions of memcmp exist in many C standard libraries.高度优化的memcmp版本存在于许多 C 标准库中。 These will usually take advantage of architecture-specific instructions to work with lots of data in parallel.这些通常会利用特定于架构的指令并行处理大量数据。

In Glibc, there are versions of memcmp for x86_64 that can take advantage of the following instruction set extensions:在 Glibc 中，有适用于 x86_64的memcmp版本可以利用以下指令集扩展：

SSE2 - sysdeps/x86_64/memcmp.S SSE2 - sysdeps/x86_64/memcmp.S
SSE4 - sysdeps/x86_64/multiarch/memcmp-sse4.S SSE4 - sysdeps/x86_64/multiarch/memcmp-sse4.S
SSSE3 - sysdeps/x86_64/multiarch/memcmp-ssse3.S SSSE3 - sysdeps/x86_64/multiarch/memcmp-ssse3.S

The cool part is that glibc will detect (at run-time) the newest instruction set your CPU has, and execute the version optimized for it.很酷的部分是 glibc 将检测（在运行时）您的 CPU 具有的最新指令集，并执行为其优化的版本。 See this snippet from sysdeps/x86_64/multiarch/memcmp.S :从sysdeps/x86_64/multiarch/memcmp.S看到这个片段：

ENTRY(memcmp)
    .type   memcmp, @gnu_indirect_function
    LOAD_RTLD_GLOBAL_RO_RDX
    HAS_CPU_FEATURE (SSSE3)
    jnz 2f
    leaq    __memcmp_sse2(%rip), %rax
    ret 

2:  HAS_CPU_FEATURE (SSE4_1)
    jz  3f  
    leaq    __memcmp_sse4_1(%rip), %rax
    ret 

3:  leaq    __memcmp_ssse3(%rip), %rax
    ret 

END(memcmp)

In the Linux kernel在 Linux 内核中

Linux does not seem to have an optimized version of memcmp for x86_64, but it does for memcpy , in arch/x86/lib/memcpy_64.S . Linux 似乎没有针对 x86_64 的memcmp优化版本，但在arch/x86/lib/memcpy_64.S中有针对memcpy的优化版本。 Note that is uses the alternatives infrastructure ( arch/x86/kernel/alternative.c ) for not only deciding at runtime which version to use, but actually patching itself to only make this decision once at boot-up.请注意，它使用替代基础架构（ arch/x86/kernel/alternative.c ）不仅在运行时决定使用哪个版本，而且实际上修补自身以仅在启动时做出此决定。

Answer 2

Is memcmp a CPU instruction or something? memcmp 是 CPU 指令还是什么？

It is at least a very highly optimized compiler-provided intrinsic function.它至少是一个高度优化的编译器提供的内在函数。 Possibly a single machine instruction, or two, depending on the platform, which you haven't specified.可能是一条或两条机器指令，具体取决于您尚未指定的平台。

Answer 3

It's usually a compiler intrinsic that is translated into fast assembly with specialized instructions for comparing blocks of memory.它通常是一个编译器内在函数，它被翻译成具有用于比较内存块的专门指令的快速汇编。

intrinsic memcmp 内在的 memcmp

Answer 4

Yes, on intel hardware, there's a single assembly instruction for such a loop. 是的，在intel硬件上，有一个用于这种循环的汇编指令。 The runtime will use that. 运行时将使用它。 (I don't exactly remember, it was something like rep cmps[b|w] , depending also on the datasize) （我不记得，它有点像rep cmps[b|w] ，还取决于数据量）

为什么 memcmp 比 for 循环检查快这么多？

问题描述

3 个解决方案

解决方案1
46 已采纳 2014-01-14 05:43:45

As a "builtin"作为“内置”

As a library function作为库函数

In the Linux kernel在 Linux 内核中

解决方案2
0 2014-01-14 05:44:24

解决方案3
0 2014-01-14 05:45:49

解决方案4
-2 2014-01-14 05:47:37

为什么 memcmp 比 for 循环检查快这么多？

问题描述

3 个解决方案

解决方案1 46 已采纳 2014-01-14 05:43:45

As a "builtin"作为“内置”

As a library function作为库函数

In the Linux kernel在 Linux 内核中

解决方案2 0 2014-01-14 05:44:24

解决方案3 0 2014-01-14 05:45:49

解决方案4 -2 2014-01-14 05:47:37

解决方案1
46 已采纳 2014-01-14 05:43:45

解决方案2
0 2014-01-14 05:44:24

解决方案3
0 2014-01-14 05:45:49

解决方案4
-2 2014-01-14 05:47:37