在C ++中复制2D向量一维的最快方法

Question

I have a 2D vector which im using for complex numbers. 我有一个二维向量，即时通讯用于复杂的数字。 Just for example: 例如：

vector<vector<double>> Complex;
vector<double> ComplexNumber;
ComplexNumber.push_back(5);  // real part
ComplexNumber.push_back(-4); // imag part
Complex.push_back(ComplexNumber); // Complex[i][0] - real part, [i][1] - imag

In depth of my code i need to pull out some part of my Complex vector in to other. 在我的代码深处，我需要将我的Complex向量的一部分拉到其他部分。 Like, copy from index 10 to 18 real part in some variable (1D vector) and copy from index 10 to 18 imag part in some other variable (1D vector). 像是从某个变量（1D向量）的索引10到18实数部分复制，并从其他变量（1D向量）的索引10到18 imag部分复制。 Currently im doing this with for cycle: 目前，我使用for周期来执行此操作：

for (int j=0; j<=Samples; j++)
{
  refRealSignal[j] = ReferenseComplexSignalsSampled[(i*SignalSampleIndex)+j][0] ;
  refImagSignal[j] = ReferenseComplexSignalsSampled[(i*SignalSampleIndex)+j][1] ;
}

This code is the bottleneck of the entire program as profiler shows. 如探查器所示，此代码是整个程序的瓶颈。 Is there any way to improve it? 有什么办法可以改善吗？

Small update: "Sample" variable is an int from 8 to 20, usually 8. Variable i comes from an outer for loop. 小更新： “样本”变量的int为8到20，通常为8。变量i来自外部for循环。

Big update: So, i put out 2D vector and rewrite everything with complex class. 重大更新：因此，我推出了2D矢量并用complex类重写了所有内容。 Also i rewrite my mul operation in "for" cycle. 我也以“ for”周期重写了mul操作。 I do not know why, but copying from complex.imag takes more time (more by 2) then from complex.real part. 我不知道为什么，但复制complex.imag需要更多的时间（多由2），然后从complex.real一部分。 After all of this perfomance of code increased from ~5 ms for one sample to ~1.8 ms for one sample. 之后，所有代码的性能从一个样本的〜5 ms增加到一个样本的〜1.8 ms。 (2.5 ms after i rewrite mul operation and also rewrite entire cycle, this was a very helpfull advice, thanks a lot) （在我重写mul操作并重写整个周期后2.5毫秒，这是一个非常有用的建议，非常感谢）

Answer 1

If Samples is big, you could save some multiplications regarding i . 如果Samples很大，则可以保存一些有关i乘法。 So change this: 所以改变这个：

for (int j=0; j<=Samples; j++)
{
  refRealSignal[j] = ReferenseComplexSignalsSampled[(i*SignalSampleIndex)+j][0] ;
  refImagSignal[j] = ReferenseComplexSignalsSampled[(i*SignalSampleIndex)+j][1] ;
}

to this: 对此：

int index;
for(i = ..) {                    // assuming your code has a for loop for i
  index = i*SignalSampleIndex;
  for (int j=0; j<=Samples; ++j) // change the ++ as pre-fix
  {
    refRealSignal[j] = ReferenseComplexSignalsSampled[index+j][0] ;
    refImagSignal[j] = ReferenseComplexSignalsSampled[index+j][1] ;
  }
}

That way you do 1 multiplication, instead of 2 * Samples , as luk32 noticed. 这样，您将执行1乘法，而不是2 * Samples ，就像luk32注意到的那样。

Another approach, as discussed in the comments, you could use a class for representing your complex number. 如评论中所述，另一种方法可以使用一个类来表示您的复数。 STL provides a class for that: std::complex . STL为此提供了一个类： std::complex 。

Then you would have a vector with type of std::complex and that would keep your data more robust, which might improve locality , which caching shall take advantage of. 然后，您将拥有一个类型为std::complex的vector ，该vector将使您的数据更加健壮，这可能会改善locality ，并应利用caching 。

You could do something like this: 您可以执行以下操作：

#include <iostream>     // std::cout
#include <complex>      // std::complex, std::real
#include <vector>   // std::vector

int main ()
{
  std::vector<std::complex<double> >complex;

  // if you know the amount of your numbers,
  // use a reserve(). Assuming you will insert
  // 100000 numbers, the code would be
  complex.reserve(100000);

  for(int i = 0; i < 100000; ++i)
      complex[i] = {0.1, 0.2};

  std::cout << "Real part of 1st element: " << std::real(complex[0]) << '\n';

  return 0;
}

[EDIT] [编辑]

The multiplications issue is possible to be performed by the compiler, by using an optimization flag. 乘法问题可以由编译器通过使用优化标志来执行。 Make sure that you profile your code, when it's compiled with an optimization flag. 使用优化标志编译代码时，请确保对代码进行概要分析。

Tip : 提示：

Usually if a section is slowing your program down, there are two approaches: (1) make that section faster, or (2) find a way to do that section less often. 通常，如果某个节使您的程序变慢，则有两种方法：（1）加快该节的速度，或（2）找到一种减少执行该节的方法。

(credits to Psyduck, aka Mooling duck) （归功于Psyduck，又名Mooling鸭子）

In your case, you can tried what I suggested above to make your code faster, but if you would think again your logic and avoid/decrease the times that you copy, then would be rewarded with a boost in the performance. 在您的情况下，您可以尝试上面我建议的方法，以使您的代码更快，但是，如果您再考虑一下逻辑并避免/减少了复制的时间，则性能会得到提高。

Answer 2

Using an std::vector<double> for complex numbers is a huge mistake wrt. 对复数使用std::vector<double>是一个巨大的错误。 performance. 性能。 Why? 为什么？ For several reasons: 有几个原因：

Allocation takes forever. 分配需要永远。 Typical values are somewhere upwards from 200 ns. 典型值在200 ns以上。
Memory is allocated on the heap. 内存分配在堆上。 The overhead in terms of space is huge. 在空间方面的开销是巨大的。
- Typical overhead within the memory allocator: two pointers, ie 8 or 16 bytes, depending on your architecture. 内存分配器中的典型开销：两个指针，即8或16个字节，具体取决于您的体系结构。
- Overhead of the std::vector<> itself: two pointers, another 8 or 16 bytes. std::vector<>本身的开销：两个指针，另外8或16个字节。
- Overallocation of the std::vector<> : Typical implementations never allocate memory for only two elements. std::vector<>过度分配：典型的实现永远不会只为两个元素分配内存。 I would estimate this overhead to at least six elements (eight elements minimal allocation). 我估计此开销至少为六个元素（最少分配八个元素）。 That would cause an overhead of 48 bytes. 这将导致48字节的开销。
So, you end up using somewhat like 80 bytes to store something that would fit into 16. 因此，您最终使用了大约80个字节来存储适合16的内容。
This matters, because it means your caches / memory bus have to do five times the work! 这很重要，因为这意味着您的缓存/内存总线必须完成五倍的工作！
Memory is allocated on the heap. 内存分配在堆上。 That means your complex numbers are likely scattered. 这意味着您的复数可能会分散。 This is another blow to cache efficiency. 这是对缓存效率的又一打击。

If you want to be fast, use either arrays with two elements (doesn't matter if you use C-style arrays or C++ std::array<> ) or define your complex type as a plain old data struct . 如果想提高速度，请使用带有两个元素的数组（使用C样式数组或C ++ std::array<>都没关系），或者将复杂类型定义为普通的旧数据struct 。 All three options have the same memory layout, and thus should be equivalent in performance. 这三个选项均具有相同的内存布局，因此性能应相同。 But I would prefer the struct approach since it allows you to overload the operators which is nice for mathematical types like complex numbers, vectors, quaternions, and such. 但是我更喜欢使用struct方法，因为它允许您重载运算符，这对复数，向量，四元数等数学类型非常有用。

在C ++中复制2D向量一维的最快方法

问题描述

2 个解决方案

解决方案1
1 已采纳 2014-05-19 15:47:31

解决方案2
1 2014-05-19 18:18:42

在C ++中复制2D向量一维的最快方法

问题描述

2 个解决方案

解决方案1 1 已采纳 2014-05-19 15:47:31

解决方案2 1 2014-05-19 18:18:42

解决方案1
1 已采纳 2014-05-19 15:47:31

解决方案2
1 2014-05-19 18:18:42