简体   繁体   中英

Avoid initialization in c++ vector or valarray

In my project I've to copy a lot of numerical data in an std::valarray (or std::vector) from a CUDA (GPU) device (from the memory of the video-card to std::valarray).

So I need to resize these data-structures as faster as possible but when I call the member method vector::resize it initialize all elements of the array to the default value, with a loop.

// In a super simplified description resize behave like this pseudocode:
vector<T>::resize(N){
   // Setup the new size

   // allocate the new array
   this->_internal_vector = new T[N];

   // init to default
   // This loop is slow !!!!
   for ( i = 0; i < N ; ++i){
      this->_internal_vector[i] = T();
   }
}

Clearly I don't need this initialization because I've to copy data from the GPU and all old data are overwritten. And the initialization require some time; so I've a loss of performance.

For coping the data I need allocated memory; generated by the method resize().

I very dirty and wrong solution is to use the method vector::reserve(), but I lost all the features of the vector; and if I resize the data are replaced with the default value.

So, if you know, there exists a strategy for avoiding this pre-initialization to the default value (in valarray or vector).

I want a method resize that behave like this:
vector<T>::resize(N) {
    // Allocate the memory.
    this->_internal_vector = new T[N];

    // Update the the size of the vector or valarray

    // !! DO NOT initialize the new values.
}

An example of the performances:

#include <chrono>
#include <iostream>
#include <valarray>
#include <vector>

int main() {

  std::vector<double> vec;
  std::valarray<double> vec2;

  double *vec_raw;

  unsigned int N = 100000000;

  std::clock_t start;
  double duration;

  start = std::clock();
  // Dirty solution!
  vec.reserve(N);

  duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
  std::cout << "duration reserve: " << duration << std::endl;

  start = std::clock();

  vec_raw = new double[N];

  duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
  std::cout << "duration new: " << duration << std::endl;

  start = std::clock();

  for (unsigned int i = 0; i < N; ++i) {
    vec_raw[i] = 0;
  }

  duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
  std::cout << "duration raw init: " << duration << std::endl;

  start = std::clock();
  // Dirty solution
  for (unsigned int i = 0; i < vec.capacity(); ++i) {
    vec[i] = 0;
  }

  duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
  std::cout << "duration vec init dirty: " << duration << std::endl;

  start = std::clock();

  vec2.resize(N);

  duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
  std::cout << "duration valarray resize: " << duration << std::endl;

  return 0;
}

Output:

duration reserve: 1.1e-05
duration new: 1e-05
duration raw init: 0.222263
duration vec init dirty: 0.214459
duration valarray resize: 0.215735

Note: replacing the std::allocator does not work because the loop is called by the resize().

Let's say you have an array (or some collection) with the data called data and you want to copy it to a vector vec . Then the idiomatic way to do this would be to use std::vector::reserve and then std::vector::push_back . std::vector::reserve will allocate memory for the std::vector but it will not initialize the memory, or set the internal counter etc. std::vector::push_back will insert the data and update the vector's size. Optionally, use std::vector::insert that takes two iterators, to avoid looping and pushing back every element individually.

std::vector<double> vec;
vec.reserve(std::size(data)); // Allocate all data in one call.
vec.insert(std::begin(vec), std::begin(data), std::end(data)); // Insert the data elements.

Alternatively you can use std::vector 's ctor overload that takes two iterators:

std::vector<double> vec{std::begin(data), std::end(data)};

This will also allocate all data in a single call, and then add the elements.

Update

If you know the data size in advance, you could simply use std::array , eg :

constexpr const std::size_t N = 10'000;
std::array<double, N> arr;

arr[5432] = 2.5; // Perfectly valid.
// Or e.g. for CUDA.
cudaMemcpy(std::data(arr), gpu_arr, std::size(arr), cudaMemcpyDeviceToHost);

All data will be allocated at once, and no default initialization will be performed (values are default initialized, but for fundamental types this means nothing is done [indeterminate values]).

std::array has all the advantages of C++ collections as std::size , std::begin , std::end , std::data etc.

If you are working with plain old data (no pointers or references, just integers and floats), it may be best to just use a plain old array. Combine that with correct use of memcpy() , and you are pretty much guaranteed to get much better performance than any native C++ implementation.

The point is, that C++ cannot really handle swaths of data as swaths of data. It has to handle individual objects of unknown type. It does not know whether these objects may be copied by copying their bits, it must call the adequate default, copy, or move constructors, (move) assignment operators, and destructor for each individual element. While good C++ compilers are able to remove much of the resulting garbage code, the result generally cannot compete with the carefully hand-optimized implementations of memcpy() that can just copy in chunks of 16 or more bytes, blissfully ignorant of whether these are actually eight short s, two double s, or 1.33 instances of struct { float x,y,z; } struct { float x,y,z; } .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM