简体   繁体   English

memcpy 什么时候比简单的重复分配更快?

[英]When is memcpy faster than simple repeated assignment?

Assume that one wants to make a copy of an array declared as假设一个人想要复制一个声明为的数组

DATA_TYPE src[N];

Is memcpy always as fast as or faster than the following code snippet, regardless of what DATA_TYPE and the number of elements of the array are? memcpy是否总是与以下代码片段一样快或更快,而不管DATA_TYPE和数组元素的数量是什么?

DATA_TYPE dest[N];

for (int i=0; i<N; i++)
    dest[i] = src[i];

For a small type like char and large N we can be sure that memcpy is faster (unless the compiler replaces the loop with a call to memcpy ).对于像char和大N这样的小类型,我们可以确定memcpy更快(除非编译器将循环替换为对memcpy的调用)。 But what if the type is larger, like double , and/or the number of array elements is small?但是如果类型更大,比如double ,和/或数组元素的数量很少呢?

This question came to my mind when copying many arrays of double s each with 3 elements.当复制许多 arrays 个double时,我想到了这个问题,每个 double 有 3 个元素。

I didn't find an answer to my question in the answer to the other question mentioned by wohlstad in the comments.我在wohlstad在评论中提到的另一个问题的答案中没有找到我的问题的答案。 The accepted answer in that question essentially says "leave it for the compiler to decide."该问题中公认的答案基本上是说“留给编译器来决定”。 That's not the sort of answer I'm looking for.这不是我要找的那种答案。 The fact that a compiler can optimize memory copying by choosing one alternative is not an answer.编译器可以通过选择一个替代方案来优化 memory 复制这一事实并不是答案。 Why and when is one alternative faster?为什么以及什么时候一种选择更快? Maybe compilers know the answer, but developers, including compiler developers, don't know!也许编译器知道答案,但是包括编译器开发人员在内的开发人员不知道!

Since memcpy is a library function, it is entirely dependent on the library implementation how efficient it actually is and no definitive answer is possible.由于memcpy是一个库 function,它完全取决于库的实现,它的实际效率如何,并且没有明确的答案是可能的。

That said, any provided standard library is likely to be highly optimised and may even use hardware specific features such as DMA transfer.也就是说,任何提供的标准库都可能经过高度优化,甚至可能使用 DMA 传输等硬件特定功能。 Whereas your code loop performance will vary depending on the optimisation settings, so is likely to perform much worse in unoptimised debug builds.虽然您的代码循环性能会因优化设置而异,因此在未优化的调试构建中可能会表现更差。

Another consideration is that the performance of memcpy() will be independent of data type and generally deterministic , whereas your loop performance is likely to vary depending on DATA_TYPE , or even the value of N .另一个考虑因素是memcpy()的性能将独立于数据类型并且通常是确定性的,而您的循环性能可能会因DATA_TYPE甚至N的值而异。

Generally, I would expect memcpy() to be optimal and faster or as fast as an assignment loop, and certainly more consistent and deterministic, being independent of specific compiler settings, and even the compiler used.通常,我希望memcpy()是最优的,更快或与赋值循环一样快,并且肯定更一致和确定性,独立于特定的编译器设置,甚至独立于所使用的编译器。

In the end, the only way to tell is to measure it for your specific platform, toolchain, library and build options, and also for various data types.最后,唯一的判断方法是针对您的特定平台、工具链、库和构建选项以及各种数据类型来衡量它。 Ultimately since you would have to measure it for every usage combination to know if it were faster, I suggest that it is generally a waste of time, and of academic interest only - use the library - not only for performance and consistency, but also for clarity and maintainability.最终,由于您必须针对每种使用组合对其进行测量才能知道它是否更快,我建议这通常是在浪费时间,并且仅出于学术兴趣-使用该库-不仅是为了性能和一致性,而且也是为了清晰度和可维护性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM