[英]What are real significant cases when memcpy() is faster than memmove()?
The key difference between memcpy()
and memmove()
is that memmove()
will work fine when source and destination overlap. memcpy()
和memmove()
之间的关键区别在于,当源和目标重叠时, memmove()
将正常工作。 When buffers surely don't overlap memcpy() is preferable since it's potentially faster. 当缓冲区肯定不重叠时, memcpy()更可取,因为它可能更快。
What bothers me is this potentially . 困扰我的是这个潜在的 。 Is it a microoptimization or are there real significant examples when
memcpy()
is faster so that we really need to use memcpy()
and not stick to memmove()
everywhere? 它是一个微优化还是当
memcpy()
更快时有真正重要的例子,所以我们真的需要使用memcpy()
而不是到处都有memmove()
?
There's at least an implicit branch to copy either forwards or backwards for memmove()
if the compiler is not able to deduce that an overlap is not possible. 如果编译器无法推断出无法重叠,那么至少有一个隐式分支可以向前或向后复制
memmove()
。 This means that without the ability to optimize in favor of memcpy()
, memmove()
is at least slower by one branch, and any additional space occupied by inlined instructions to handle each case (if inlining is possible). 这意味着如果不能优化
memcpy()
, memmove()
至少会被一个分支放慢,并且内联指令占用的任何额外空间都可以处理每种情况(如果可以内联)。
Reading the eglibc-2.11.1
code for both memcpy()
and memmove()
confirms this as suspected. 读取
memcpy()
和memmove()
的eglibc-2.11.1
代码可以确认这一点。 Furthermore, there's no possibility of page copying during backward copying, a significant speedup only available if there's no chance for overlapping. 此外,在向后复制期间不可能进行页面复制,只有在没有重叠的情况下才能获得显着的加速。
In summary this means: If you can guarantee the regions are not overlapped, then selecting memcpy()
over memmove()
avoids a branch. 总之,这意味着:如果可以保证区域不重叠,那么在
memmove()
选择memcpy()
memmove()
可以避免分支。 If the source and destination contain corresponding page aligned and page sized regions, and don't overlap, some architectures can employ hardware accelerated copies for those regions, regardless of whether you called memmove()
or memcpy()
. 如果源和目标包含相应的页面对齐和页面大小的区域,并且不重叠,则某些体系结构可以为这些区域使用硬件加速副本,无论您是否调用了
memmove()
或memcpy()
。
There is actually one more difference beyond the assumptions and observations I've listed above. 除了我上面列出的假设和观察之外,实际上还有一个区别。 As of C99, the following prototypes exist for the 2 functions:
从C99开始,这两个函数存在以下原型:
void *memcpy(void * restrict s1, const void * restrict s2, size_t n);
void *memmove(void * s1, const void * s2, size_t n);
Due to the ability to assume the 2 pointers s1
and s2
do not point at overlapping memory, straightforward C implementations of memcpy
are able to leverage this to generate more efficient code without resorting to assembler, see here for more. 由于能够假设2个指针
s1
和s2
没有指向重叠的内存,因此memcpy
直接C实现能够利用它来生成更高效的代码,而无需借助汇编程序,请参阅此处了解更多信息。 I'm sure that memmove
can do this, however additional checks would be required above those I saw present in eglibc
, meaning the performance cost may be slightly more than a single branch for C implementations of these functions. 我确信
memmove
可以做到这一点,但是我在eglibc
看到的那些上面需要额外的检查,这意味着对于这些函数的C实现,性能成本可能略高于单个分支。
At best, calling memcpy
rather than memmove
will save a pointer comparison and a conditional branch. 充其量,调用
memcpy
而不是memmove
将保存指针比较和条件分支。 For a large copy, this is completely insignificant. 对于大型副本,这是完全无关紧要的。 If you are doing many small copies, then it might be worth measuring the difference;
如果您正在做许多小型副本,那么可能值得衡量差异; that is the only way you can tell whether it's significant or not.
这是唯一可以判断它是否重要的方法。
It is definitely a microoptimisation, but that doesn't mean you shouldn't use memcpy
when you can easily prove that it is safe. 它绝对是一种微观优化,但这并不意味着当您可以轻松证明它是安全的时候不应该使用
memcpy
。 Premature pessimisation is the root of much evil. 过早的悲观情绪是多恶的根源。
Well, memmove
has to copy backwards when the source and destination overlap, and the source is before the destination. 好吧,
memmove
必须在源和目标重叠时向后复制, 并且源位于目标之前。 So, some implementations of memmove
simply copy backwards when the source is before the destination, without regard for whether the two regions overlap. 因此,
memmove
某些实现只是在源位于目标之前时向后复制,而不考虑这两个区域是否重叠。
A quality implementation of memmove
can detect whether the regions overlap, and do a forward-copy when they don't. memmove
的高质量实现可以检测区域是否重叠,并在不执行时进行前向复制。 In such a case, the only extra overhead compared to memcpy
is simply the overlap checks. 在这种情况下,与
memcpy
相比,唯一的额外开销就是重叠检查。
Simplistically, memmove
needs to test for overlap and then do the appropriate thing; 简单地说,
memmove
需要测试重叠然后做适当的事情; with memcpy
, one asserts that there is not overlap so no need for additional tests. 使用
memcpy
,一个断言没有重叠,因此不需要额外的测试。
Having said that, I have seen platforms that have exactly the same code for memcpy
and memmove
. 话虽如此,我已经看到了具有完全相同的
memcpy
和memmove
代码的平台。
It's certainly possible that memcpy
is merely a call to memmove
, in which case there'd be no benefit to using memcpy
. memcpy
当然可能仅仅是对memmove
的调用,在这种情况下使用memcpy
没有任何好处。 On the other extreme, it's possible that an implementor assumed memmove
would rarely be used, and implemented it with the simplest possible byte-at-a-time loops in C, in which case it could be ten times slower than an optimized memcpy
. 另一方面,实现者可能很少使用
memmove
,并且在C中使用最简单的一次一个字节循环来实现它,在这种情况下,它可能比优化的memcpy
慢十倍。 As others have said, the likeliest case is that memmove
uses memcpy
when it detects that a forward copy is possible, but some implementations may simply compare the source and destination addresses without looking for overlap. 正如其他人所说,最有可能的情况是
memmove
在检测到正向拷贝可能时使用memcpy
,但是某些实现可能只是比较源地址和目标地址而不寻找重叠。
With that said, I would recommend never using memmove
unless you're shifting data within a single buffer. 话虽如此,我建议永远不要使用
memmove
除非你在一个缓冲区内移动数据。 It might not be slower, but then again, it might be, so why risk it when you know there's no need for memmove
? 它可能不会慢,但话又说回来,那么为什么当你知道不需要
memmove
时冒险呢?
Just simplify and always use memmove
. 只需简化并始终使用
memmove
。 A function that's right all the time is better than a function that's only right half the time. 一直都是正确的功能比只有一半时间的功能更好。
It is entirely possible that in most implementations, the cost of a memmove() function call will not be significantly greater than memcpy() in any scenario in which the behavior of both is defined. 完全有可能在大多数实现中,memmove()函数调用的成本在定义两者行为的任何场景中都不会比memcpy()大得多。 There are two points not yet mentioned, though:
但是,有两点尚未提及:
\n mov esi,[src]This would take the same amount of in-line code, but run much faster than:mov esi,[src]\n mov edi,[dest]
mov edi,[dest]\n mov ecx,1234/4 ;
mov ecx,1234/4; Compiler could notice it's a constant
编译器可能会注意到它是一个常数\n cld
CLD\n rep movsl
rep movsl\n
\n push [src]推[src]\n push [dest]
推[dest]\n push dword 1234
推dword 1234\n call _memcpy
打电话给_memcpy\n\n ...
...\n\n_memcpy:
_memcpy:\n push ebp
推ebp\n mov ebp,esp
mov ebp,尤其是\n mov ecx,[ebp+numbytes]
mov ecx,[ebp + numbytes]\n test ecx,3 ;
测试ecx,3; See if it's a multiple of four
看看它是否是四的倍数\n jz multiple_of_four
jz multiple_of_four\n\nmultiple_of_four:
multiple_of_four:\n push esi ;
推esi; Can't know if caller needs this value preserved
无法知道调用者是否需要保留此值\n push edi ;
推edi; Can't know if caller needs this value preserved
无法知道调用者是否需要保留此值\n mov esi,[ebp+src]
mov esi,[ebp + src]\n mov edi,[ebp+dest]
mov edi,[ebp + dest]\n rep movsl
rep movsl\n pop edi
pop edi\n pop esi
流行esi\n ret
RET \n
Quite a number of compilers will perform such optimizations with memcpy(). 相当多的编译器将使用memcpy()执行此类优化。 I don't know of any that will do it with memmove, although in some cases an optimized version of memcpy may offer the same semantics as memmove.
虽然在某些情况下memcpy的优化版本可能提供与memmove相同的语义,但我不知道有任何与memmove有关的内容。 For example, if numbytes was 20:
例如,如果numbytes为20:
; Assuming values in eax, ebx, ecx, edx, esi, and edi are not needed mov esi,[src] mov eax,[esi] mov ebx,[esi+4] mov ecx,[esi+8] mov edx,[esi+12] mov edi,[esi+16] mov esi,[dest] mov [esi],eax mov [esi+4],ebx mov [esi+8],ecx mov [esi+12],edx mov [esi+16],edi
This will work correctly even if the address ranges overlap, since it effectively makes a copy (in registers) of the entire region to be moved before any of it is written. 即使地址范围重叠,这也将正常工作,因为它有效地使整个区域的副本(在寄存器中)在其中任何一个被写入之前被移动。 In theory, a compiler could process memmove() by seeing if treading it as memcpy() would yield an implementation that would be safe even if the address ranges overlap, and call _memmove in those cases where substituting the memcpy() implementation would not be safe.
从理论上讲,编译器可以处理memmove(),看看是否将其作为memcpy()生成即使地址范围重叠也会安全的实现,并且在替换memcpy()实现的情况下调用_memmove安全。 I don't know of any that do such optimization, though.
不过,我不知道有任何优化。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.