多線程C ++：強制從內存讀取，繞過緩存

Question

我正在開發一個個人的業余時間游戲引擎，並且正在開發一個多線程批處理執行器。 我最初是在各處使用並發無鎖隊列和std :: function來促進主線程和從屬線程之間的通信，但是我決定放棄它，而采用輕量級的方式來對內存進行嚴格控制。分配：函數指針和內存池。

無論如何，我遇到了一個問題：

不管我嘗試什么，函數指針只能被一個線程正確讀取，而其他線程則讀取空指針，從而使斷言失敗。

我相當確定這是緩存問題。 我已經確認所有線程的指針地址都相同。 我嘗試將其聲明為volatile，intptr_t，std :: atomic，並嘗試了各種類型的cast-fu，所有線程似乎都忽略了它並繼續讀取其緩存的副本。

我已經在模型檢查器中對主服務器和從服務器進行了建模，以確保並發性良好，並且沒有活鎖或死鎖（前提是共享變量都正確同步）

void Executor::operator() (int me) {
    while (true) {
        printf("Slave %d waiting.\n", me);
        {
            std::unique_lock<std::mutex> lock(batch.ready_m);
            while(!batch.running) batch.ready.wait(lock);
            running_threads++;
        }
        printf("Slave %d running.\n", me);
        BatchFunc func = batch.func;
        assert(func != nullptr);

        int index;
        if (batch.store_values) {
            while ((index = batch.item.fetch_add(1)) < batch.n_items) {
                void* data = reinterpret_cast<void*>(batch.data_buffer + index * batch.item_size);
                func(batch.share_data, data);
            }
        }
        else {
            while ((index = batch.item.fetch_add(1)) < batch.n_items) {
                void** data = reinterpret_cast<void**>(batch.data_buffer + index * batch.item_size);
                func(batch.share_data, *data);
            }
        }

        // at least one thread finished, so make sure we won't loop back around
        batch.running = false;

        if (running_threads.fetch_sub(1) == 1) { // I am the last one
            batch.done = true; // therefore all threads are done
            batch.complete.notify_all();
        }
    }
}

void Executor::run_batch() {
    assert(!batch.running);
    if (batch.func == nullptr || batch.n_items == 0) return;

    batch.item.store(0);

    batch.running = true;
    batch.done = false;
    batch.ready.notify_all();

    printf("Master waiting.\n");
    {
        std::unique_lock<std::mutex> lock(batch.complete_m);
        while (!batch.done) batch.complete.wait(lock);
    }
    printf("Master ready.\n");

    batch.func = nullptr;
    batch.n_items = 0;
}

batch.func由另一個函數設置

template<typename SharedT, typename ItemT>
void set_batch_job(void(*func)(const SharedT*, ItemT*), const SharedT& share_data, bool byValue = true) {
    static_assert(sizeof(SharedT) <= SHARED_DATA_MAXSIZE, "Shared data too large");
    static_assert(std::is_pod<SharedT>::value, "Shared data type must be POD");
    assert(std::is_pod<ItemT>::value || !byValue);
    assert(!batch.running);
    batch.func = reinterpret_cast<volatile BatchFunc>(func);
    memcpy(batch.share_data, (void*) &share_data, sizeof(SharedT));
    batch.store_values = byValue;
    if (byValue) {
        batch.item_size = sizeof(ItemT);
    }
    else { // store pointers instead of values
        batch.item_size = sizeof(ItemT*);
    }
    batch.n_items = 0;
}

這是它要處理的結構（和typedef）

typedef void(*BatchFunc)(const void*, void*);
struct JobBatch {
    volatile BatchFunc func;
    void* const share_data = operator new(SHARED_DATA_MAXSIZE);

    intptr_t const data_buffer = reinterpret_cast<intptr_t>(operator new (EXEC_DATA_BUFFER_SIZE));
    volatile size_t item_size;
    std::atomic<int> item; // Index into the data array
    volatile int n_items = 0;

    std::condition_variable complete; // slave -> master signal
    std::condition_variable ready;    // master -> slave signal
    std::mutex complete_m;
    std::mutex ready_m;

    bool store_values = false;

    volatile bool running = false; // there is work to do in the batch
    volatile bool done = false;    // there is no work left to do

    JobBatch();
} batch;

如何確保線程之間正確同步所有對batch.func的必要讀寫？

以防萬一，我正在使用Visual Studio並編譯x64 Debug Windows可執行文件。 Intel i5，Windows 10、8GB RAM。

Answer 1

因此，我對C ++內存模型進行了一些閱讀，並設法使用atomic_thread_fence破解了一個解決方案。 一切可能都破爛了，因為我瘋了，不應該在這里使用自己的系統，但是，學習很有趣！

基本上，每當完成編寫您希望其他線程看到的內容時，都需要調用atomic_thread_fence(std::memory_order_release)

在接收線程上，您在讀取共享數據之前調用atomic_thread_fence(std::memory_order_acquire) 。

就我而言，釋放應在等待條件變量之前立即完成，而獲取應在使用其他線程寫入的數據之前立即完成。

這樣可以確保一個線程上的寫入被其他線程看到。

我不是專家，因此這可能不是解決問題的正確方法，並且可能會面臨一定的厄運。 例如，我仍然有一個死鎖/活鎖問題需要解決。

tl; dr：這不完全是一種緩存：線程可能不會使它們的數據彼此完全同步，除非您使用原子內存柵欄強制執行。

多線程C ++：強制從內存讀取，繞過緩存

問題描述

1 個解決方案

解決方案1
0 2017-02-26 06:37:33

多線程C ++：強制從內存讀取，繞過緩存

問題描述

1 個解決方案

解決方案1 0 2017-02-26 06:37:33

解決方案1
0 2017-02-26 06:37:33