在 C++ 中复制数组最快的可移植方法是什么

Question

This question has been bothering me for some time.这个问题困扰了我一段时间。 The possibilities I am considering are我正在考虑的可能性是

memcpy内存
std::copy标准::复制
cblas_dcopy cblas_dcopy

Does anyone have any clue on what the pros and cons are with these three?有没有人知道这三个的优缺点是什么？ Other suggestions are also welcome.也欢迎其他建议。

Answer 1

In C++ you should use std::copy by default unless you have good reasons to do otherwise.在 C++ 中，您应该默认使用 std::copy ，除非您有充分的理由不这样做。 The reason is that C++ classes define their own copy semantics via the copy constructor and copy assignment operator, and of the operations listed, only std::copy respects those conventions.原因是 C++ 类通过复制构造函数和复制赋值运算符定义了它们自己的复制语义，并且在列出的操作中，只有 std::copy 遵守这些约定。

memcpy() uses raw, byte-wise copy of data (though likely heavily optimized for cache line size, etc.), and ignores C++ copy semantics (it's a C function, after all...). memcpy() 使用原始的、逐字节的数据副本（尽管可能针对缓存行大小等进行了大量优化），并忽略了 C++ 复制语义（毕竟它是一个 C 函数......）。

cblas_dcopy() is a specialized function for use in linear algebra routines using double precision floating point values. cblas_dcopy() 是一个专门的函数，用于使用双精度浮点值的线性代数例程。 It likely excels at that, but shouldn't be considered general purpose.它可能擅长于此，但不应被视为通用目的。

If your data is "simple" POD type struct data or raw fundamental type data, memcpy will likely be as fast as you can get.如果您的数据是“简单”的 POD 类型结构数据或原始基本类型数据，则 memcpy 可能会尽可能快。 Just as likely, std::copy will be optimized to use memcpy in these situations, so you'll never know the difference.同样可能，std::copy 将被优化以在这些情况下使用 memcpy，因此您永远不会知道其中的区别。

In short, use std::copy().简而言之，使用 std::copy()。

Answer 2

Use std::copy unless profiling shows you a needed benefit in doing otherwise.使用std::copy除非分析显示您这样做所需的好处。 It honours the C++ object encapsulation, invoking copy constructors and assignment operators, and the implementation could include other inline optimisations.它尊重 C++ 对象封装，调用复制构造函数和赋值运算符，并且实现可以包括其他内联优化。 That's more maintainable if the types being copied are changed from something trivially copyable to something not.如果正在复制的类型从可简单复制的类型更改为不可复制的类型，则更易于维护。

As PeterCordes comments below, modern compilers such as GCC and clang analyse memcpy() requests internally and typically avoid an out-of-line function call, and even before that some systems had memcpy() macros that inlined copies below a certain size threshold.正如下面的 PeterCordes 评论的那样，现代编译器（例如 GCC 和 clang memcpy()内部分析memcpy()请求，并且通常会避免外联函数调用，甚至在此之前，某些系统具有内联副本低于特定大小阈值的memcpy()宏。

FWIW / on the old Linux box I have handy (in 2010), GCC doesn't do any spectacular optimisations, but bits/type_traits.h does allow the program to easily specify whether std::copy should fall through to memcpy() (see code below), so there's no reason to avoid using std::copy() in favour of memcpy() directly. FWIW / 在我手头的旧 Linux 机器上（在 2010 年），GCC 没有做任何引人注目的优化，但是bits/type_traits.h确实允许程序轻松指定std::copy是否应该落入memcpy() （见下面的代码），所以没有理由避免使用std::copy() memcpy()直接支持memcpy() 。

 * Copyright (c) 1997
 * Silicon Graphics Computer Systems, Inc.
 *
 * Permission to use, copy, modify, distribute and sell this software
 * and its documentation for any purpose is hereby granted without fee,
 * provided that the above copyright notice appear in all copies and            
 * that both that copyright notice and this permission notice appear            
 * in supporting documentation.  Silicon Graphics makes no                      
 * representations about the suitability of this software for any               
 * purpose.  It is provided "as is" without express or implied warranty.        
 ...                                                                            
                                                                            
/*                                                                              
This header file provides a framework for allowing compile time dispatch        
based on type attributes. This is useful when writing template code.            
For example, when making a copy of an array of an unknown type, it helps        
to know if the type has a trivial copy constructor or not, to help decide       
if a memcpy can be used.

The class template __type_traits provides a series of typedefs each of
which is either __true_type or __false_type. The argument to
__type_traits can be any type. The typedefs within this template will
attain their correct values by one of these means:
    1. The general instantiation contain conservative values which work
       for all types.
    2. Specializations may be declared to make distinctions between types.
    3. Some compilers (such as the Silicon Graphics N32 and N64 compilers)
       will automatically provide the appropriate specializations for all
       types.

EXAMPLE:

//Copy an array of elements which have non-trivial copy constructors
template <class _Tp> void
  copy(_Tp* __source,_Tp* __destination,int __n,__false_type);
//Copy an array of elements which have trivial copy constructors. Use memcpy.
template <class _Tp> void
  copy(_Tp* __source,_Tp* __destination,int __n,__true_type);

//Copy an array of any type by using the most efficient copy mechanism
template <class _Tp> inline void copy(_Tp* __source,_Tp* __destination,int __n) {
   copy(__source,__destination,__n,
        typename __type_traits<_Tp>::has_trivial_copy_constructor());
}
*/

Answer 3

In most cases memcpy will be the fastest, as it is the lowest level and may be implemented in machine code on a given platform.在大多数情况下，memcpy 将是最快的，因为它是最低级别的，并且可以在给定平台上以机器代码实现。 (however, if your array contains non-trivial objects memcpy may not do the correct think, so it may be safer to stick with std::copy) （但是，如果您的数组包含非平凡对象 memcpy 可能无法正确思考，因此坚持使用 std::copy 可能更安全）

However it all depends on how well the stdlib is implanted on the given platform etc. As the standard does not say how fast operations must be, there is no way to know in a “ portable ” since what will be fastest.然而，这一切都取决于标准库在给定平台上的植入程度等。由于标准没有说明操作必须有多快，因此无法知道“便携式”中什么是最快的。

Profiling your application will show the fasted on a given platform, but will only tell you about the test platform.分析您的应用程序将显示给定平台上的禁食，但只会告诉您有关测试平台的信息。

However, when you profile you application you will most likely find that the issues are in your design rather than your choose of array copy method.但是，当您分析您的应用程序时，您很可能会发现问题出在您的设计中，而不是您选择的阵列复制方法中。 (Eg why do you need to copy large arrays so match?) （例如，为什么需要复制大数组如此匹配？）

Answer 4

memcpy但是，如果您的数组包含非平凡对象，请坚持使用std::copy 。

Answer 5

memcpy is probably the fastest way to copy a contiguous block of memory. memcpy可能是复制连续内存块的最快方法。 This is because it will likely be highly optimized to your particular bit of hardware.这是因为它可能会针对您的特定硬件进行高度优化。 It is often implemented as a built-in compiler function.它通常作为内置的编译器函数来实现。

Having said that, and non POD C++ object is unlikely to be contiguous and therefore copying arrays of C++ objects using memcpy is likely to give you unexpected results.话虽如此，非 POD C++ 对象不太可能是连续的，因此使用memcpy复制 C++ 对象数组可能会给您带来意想不到的结果。 When copying arrays (or collections) of C++ objects, std::copy will use the object's own copy semantics and is therefore suitable for use with non POD C++ objects.当复制 C++ 对象的数组（或集合）时， std::copy将使用对象自己的复制语义，因此适用于非 POD C++ 对象。

cblas_dcopy looks like a copy for use with a specific library and probably has little use when not using that library. cblas_dcopy看起来像是用于特定库的副本，并且在不使用该库时可能几乎没有用处。

Answer 6

I have to think that the others will call memcpy().我不得不认为其他人会调用 memcpy()。 Having said that I can't beleive that there will be any appreciable difference.话虽如此，我无法相信会有任何明显的差异。

If it really matters to you, code all three and run a profiler, but it might be better to consider things like readability/maintainability, exception-safe, etc... (and code an assembler insert while you are at it, not that you are likely to see a difference)如果这对您来说真的很重要，请对所有三个代码进行编码并运行分析器，但最好考虑诸如可读性/可维护性、异常安全等...您可能会看到不同之处）

Is your program threaded?你的程序是线程的吗？

And, most importantly, how are you declating your array?而且，最重要的是，你如何声明你的数组？ (what is it an array of) and how large is it? （它是什么数组）以及它有多大？

Answer 7

我做了一个小的基准测试（VS 2018 Preview，MKL 2017 Update 4）来比较memcpy和cblas_?copy的顺序版本，发现它们在float和double上同样快。

Answer 8

Just Profile your application.只需配置您的应用程序。 You will likely find that copying is not the slowest part of it.您可能会发现复制并不是其中最慢的部分。

在 C++ 中复制数组最快的可移植方法是什么

问题描述

8 个解决方案

解决方案1
27 已采纳 2010-09-13 09:34:44

解决方案2
2 2010-09-13 09:57:03

解决方案3
1 2010-09-13 09:29:59

解决方案4
1 2010-09-13 09:34:13

解决方案5
1 2010-09-13 09:43:18

解决方案6
0 2010-09-13 09:35:15

解决方案7
0 2018-06-19 17:55:13

解决方案8
-3 2010-09-13 09:30:34

在 C++ 中复制数组最快的可移植方法是什么

问题描述

8 个解决方案

解决方案1 27 已采纳 2010-09-13 09:34:44

解决方案2 2 2010-09-13 09:57:03

解决方案3 1 2010-09-13 09:29:59

解决方案4 1 2010-09-13 09:34:13

解决方案5 1 2010-09-13 09:43:18

解决方案6 0 2010-09-13 09:35:15

解决方案7 0 2018-06-19 17:55:13

解决方案8 -3 2010-09-13 09:30:34

解决方案1
27 已采纳 2010-09-13 09:34:44

解决方案2
2 2010-09-13 09:57:03

解决方案3
1 2010-09-13 09:29:59

解决方案4
1 2010-09-13 09:34:13

解决方案5
1 2010-09-13 09:43:18

解决方案6
0 2010-09-13 09:35:15

解决方案7
0 2018-06-19 17:55:13

解决方案8
-3 2010-09-13 09:30:34