简体   繁体   中英

Copying std::vector between threads without locking

I have a vector that is modified in one thread, and I need to use its contents in another. Locking between these threads is unacceptable due to performance requirements. Since iterating over the vector while it is changing will cause a crash, I thought to copy the vector and then iterate over the copy. My question is, can this way also crash?

struct Data
{
    int A;
    double B;
    bool C;
};

std::vector<Data> DataVec;

void ModifyThreadFunc()
{
    // Here the vector is changed, which includes adding and erasing elements
    ...
}

void ReadThreadFunc()
{
    auto temp = DataVec;    // Will this crash?
    for (auto& data : temp)
    {
        // Do stuff with the data
        ...
    }

    // This definitely can crash
    /*for (auto& data : DataVec)
    {
        // Do stuff with the data
        ...
    }*/
}

The basic thread safety guarantee for vector::operator= is:

"if an exception is thrown, the container is in a valid state."

What types of exceptions are possible here?

EDIT:

I solved this using double buffering, and posted my answer below.

My question is, can this way also crash?

Yes, you still have a data race. If thread A modifies the vector while thread B is creating a copy, all iterators to the vector are invalidated.

What types of exceptions are possible here?

std::vector::operator=(const vector&) will throw on memory allocation failure, or if the contained elements throw on copy. The same thing applies to copy construction, which is what the line in your code marked " Will this crash? " is actually doing.


The fundamental problem here is that std::vector is not thread-safe. You have to either protect it with a lock/mutex, or replace it with a thread-safe container (such as the lock-free containers in Boost.Lockfree or libcds ).

As has been pointed out by the other answers, what you ask for is not doable. If you have concurrent access, you need synchronization, end of story.

That being said, it is not unusual to have requirements like yours where synchronization is not an option. In that case, what you can still do is get rid of the concurrent access . For example, you mentioned that the data is accessed once per frame in a game-loop like execution. Is it strictly required that you get the data from the current frame or could it also be the data from the last frame?

In that case, you could work with two vectors, one that is being written to by the producer thread and one that is being read by all the consumer threads. At the end of the frame, you simply swap the two vectors. Now you no longer need *( 1) fine-grained synchronization for the data access, since there is no concurrent data access any more.

This is just one example how to do this. If you need to get rid of locking, start thinking about how to organize data access so that you avoid getting into the situation where you need synchronization in the first place.

*( 1) : Strictly speaking, you still need a synchronization point that ensures that when you perform the swapping, all the writer and reader threads have finished working. But this is far easier to do (usually you have such a synchronization point at the end of each frame anyway) and has a far lesser impact on performance than synchronizing on every access to the vector.

I have a vector that is modified in one thread, and I need to use its contents in another. Locking between these threads is unacceptable due to performance requirements.

this is an impossible to meet requirement.

Anyway, any sharing of data between 2 threads will require a kind of locking, be it explicit or implementation (eventually hardware) provided. You must examine again your actual requirements: it can be inacceptable to suspend one thread until the other one ends, but you could lock short sequences of instructions. And/or possibly use a diffent architecture. For example erasing an item in a vector is a costly operation (linear time because you have to move all the data above the removed item) while marking it as invalid is much quicker (constant time because it is one single write). If you really have to erase in the middle of a vector, maybe a list would be more appropriate.

But if you can put a locking exclusion around the copy of the vector in ReadThreadFunc and around any vector modification in ModifyThreadFunc , it could be enough. To give a priority to the modifying thread, you could just try to lock in the other thread and immediately give up if you cannot.

Maybe you should rethink your design!

Each thread should have his own vector (list, queue whatever fit your needs) to work on. So thread A can do some work and pass the result to thrad B. You simply have to lock when writing the data from thread A int thread B's queue.

Without some kind of locking it's not possible.

So I solved this using double buffering, which guarantees no crashing, and the reading thread will always have usable data, even if it might not be correct:

struct Data
{
    int A;
    double B;
    bool C;
};

const int MAXSIZE = 100;
Data Buffer[MAXSIZE];
std::vector<Data> DataVec;

void ModifyThreadFunc()
{
    // Here the vector is changed, which includes adding and erasing elements
    ...

    // Copy from the vector to the buffer
    size_t numElements = DataVec.size();
    memcpy(Buffer, DataVec.data(), sizeof(Data) * numElements);
    memset(&Buffer[numElements],  0, sizeof(Data) * (MAXSIZE - numElements));
}

void ReadThreadFunc()
{
    Data* p = Buffer;
    for (int i = 0; i < MAXSIZE; ++i)
    {
        // Use the data
        ...
        ++p;
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM