C++ Efficient way to construct large vector of shared_ptr to class

Question

I need to construct a large std::vector<std::shared_ptr<A>> many_ptr_to_A .

Ideally, for A a non-default constructor with arguments is used. Several variants are defined in the code sample below:

#include <iostream>
#include <vector>
#include <memory>
#include <ctime>

class A
{
public:
    A(std::vector<double> data):
        data(data)
    {}
    A():
        data(std::vector<double>(3, 1.))
    {}

    std::vector<double> data;
};

int main()
{
    int n = 20000000;
    std::vector<std::shared_ptr<A>> many_ptr_to_A;

    // option 1
    std::clock_t start = std::clock();
    std::vector<A> many_A(n, std::vector<double>(3, 1.));
    std::cout << double(std::clock() - start) / CLOCKS_PER_SEC << std::endl;
    // end option 1

    many_ptr_to_A.clear();

    // option 2
    start = std::clock();
    many_ptr_to_A.reserve(n);
    for (int i=0; i<n; i++) {
        many_ptr_to_A.push_back(std::shared_ptr<A>(new A(std::vector<double>(3, 1.))));
    }
    std::cout << double(std::clock() - start) / CLOCKS_PER_SEC << std::endl;
    // end option 2

    many_ptr_to_A.clear();

    // option 3
    start = std::clock();
    A* raw_ptr_to_A = new A[n];
    for (int i=0; i<n; i++) {
        many_ptr_to_A.push_back(std::shared_ptr<A>(&raw_ptr_to_A[i]));
    }
    std::cout << double(std::clock() - start) / CLOCKS_PER_SEC << std::endl;
    // end option 3

    return 0;
}

Option 1

Rather fast but unfortunately I need pointers instead of raw objects. A method to create pointers to the resulting allocated space and preventing the vector from deleting the objects would be great but I can't think of one.

Option 2

This works and I can feed specific data in the constructor for every A . Unfortunately, this is rather slow. Using std::make_shared instead of new is not really improving the situation.

Even worse, this seems to be a big bottleneck when used in multiple threads. Assuming, that I run option 2 in 10 threads with n_thread = n / 10 , instead of being around ten times faster the whole thing is around four times slower. Why does this happen? Is it a problem when multiple thread try to allocate many small pieces of memory?

The number of cores on the server I'm using is larger than the number of threads. The rest of my application scales nicely with the number of cores, thus this actually represents a bottleneck.

Unfortunately, I'm not really experienced when it comes to parallelization...

Option 3

With this approach I tried to combine the fast allocation with a raw new at one go and the shared_ptrs. This compiles, but unfortunately yields a segmentation fault when the destructor of the vector is called. I don't fully understand why this happens. Is it because A is not POD?

In this approach I would manually fill the object-specific data into the objects after their creation.

Question

How can I perform the allocation of a large number of shared_ptr to A in an efficient way which also scales nicely when used on many threads/cores? Am I missing an obvious way to construct the std::vector<std::shared_ptr<A>> many_ptr_to_A in one go?

My system is a Linux/Debian server. I compile with g++ and -O3, -std=c++11 options.

Any help is highly appreciated :)

Answer 1

Option 3 is undefined behaviour, you have n shared_ptrs which will all try to delete a single A , but there must be only one delete[] for the whole array, not delete used n times. You could do this though:

std::unique_ptr<A[]> array{ new A[n] };
std::vector<std::shared_ptr<A>> v;
v.reserve(n);
v.emplace_back(std::move(array));
for (int i = 1; i < n; ++i)
  v.push_back(std::shared_ptr<A>{v[0], v[0].get() + i});

This creates a single array, then creates n shared_ptr objects which all share ownership of the array and which each point to a different element of the array. This is done by creating one shared_ptr that owns the array (and a suitable deleter) and then creating n-1 shared_ptrs that alias the first one, ie share the same reference count, even though their get() member will return a different pointer.

A unique_ptr<A[]> is initialized with the array first, so that default_delete<A[]> will be used as the deleter, and that deleter will be transferred into the first shared_ptr , so that when the last shared_ptr gives up ownership the right delete[] will be used to free the whole array. To get the same effect you could create the first shared_ptr like this instead:

v.push_back(std::shared_ptr<A>{new A[n], std::default_delete<A[]>{}});

Or:

v.emplace_back(std::unique_ptr<A[]>{new A[n]});

C++ Efficient way to construct large vector of shared_ptr to class

Question

Option 1

Option 2

Option 3

Question

1 answers

solution1
3 ACCPTED 2014-11-25 00:19:12

C++ Efficient way to construct large vector of shared_ptr to class

Question

Option 1

Option 2

Option 3

Question

1 answers

solution1 3 ACCPTED 2014-11-25 00:19:12

solution1
3 ACCPTED 2014-11-25 00:19:12