圍繞 C++ 中原子負載存儲的優化

Question

我已閱讀 C++ 中的std::memory_order並部分理解。 但我對此仍有一些疑問。

關於std::memory_order_acquire的解釋說，當前線程中的任何讀取或寫入都不能在此之前重新排序 load 。 這是否意味着編譯器和 cpu 不允許在acquire語句下方移動任何指令？

auto y = x.load(std::memory_order_acquire);
z = a;  // is it leagal to execute loading of shared `b` above acquire? (I feel no)
b = 2;  // is it leagal to execute storing of shared `a` above acquire? (I feel yes)

我可以推理為什么在acquire之前執行加載是非法的。 但是為什么商店是非法的呢？

從atomic對象跳過無用的加載或存儲是否違法？ 因為它們不是volatile ，而且據我所知只有 volatile 有這個要求。

auto y = x.load(std::memory_order_acquire);  // `y` is never used
return;

即使使用relaxed的內存順序，這種優化也不會發生。

編譯器是否允許將出現在acquire語句上方的指令移動到其下方？

z = a;  // is it leagal to execute loading of shared `b` below acquire? (I feel yes)
b = 2;  // is it leagal to execute storing of shared `a` below acquire? (I feel yes)
auto y = x.load(std::memory_order_acquire);

可以在不跨越acquire邊界的情況下重新排序兩個加載或存儲嗎？

auto y = x.load(std::memory_order_acquire);
a = p;  // can this move below the below line?
b = q;  // shared `a` and `b`

與release語義類似且對應的4個疑問也。

與第二個和第三個問題相關，為什么沒有編譯器在優化f() ，就像下面代碼中的g()一樣激進？

#include <atomic>

int a, b;

void dummy(int*);

void f(std::atomic<int> &x) {
    int z;
    z = a;  // loading shared `a` before acquire
    b = 2;  // storing shared `b` before acquire
    auto y = x.load(std::memory_order_acquire);
    z = a;  // loading shared `a` after acquire
    b = 2;  // storing shared `b` after acquire
    dummy(&z);
}

void g(int &x) {
    int z;
    z = a;
    b = 2;
    auto y = x;
    z = a;
    b = 2;
    dummy(&z);
}

f(std::atomic<int>&):
        sub     rsp, 24
        mov     eax, DWORD PTR a[rip]
        mov     DWORD PTR b[rip], 2
        mov     DWORD PTR [rsp+12], eax
        mov     eax, DWORD PTR [rdi]
        lea     rdi, [rsp+12]
        mov     DWORD PTR b[rip], 2
        mov     eax, DWORD PTR a[rip]
        mov     DWORD PTR [rsp+12], eax
        call    dummy(int*)
        add     rsp, 24
        ret
g(int&):
        sub     rsp, 24
        mov     eax, DWORD PTR a[rip]
        mov     DWORD PTR b[rip], 2
        lea     rdi, [rsp+12]
        mov     DWORD PTR [rsp+12], eax
        call    dummy(int*)
        add     rsp, 24
        ret
b:
        .zero   4
a:
        .zero   4

Answer 1

第一季度

一般來說，是的。 任何在獲取加載之后（按程序順序）的加載或存儲，在它之前都不能變得可見。

這是一個重要的例子：

#include <atomic>
#include <thread>
#include <iostream>

std::atomic<int> x{0};
std::atomic<bool> finished{false};
int xval;
bool good;

void reader() {
    xval = x.load(std::memory_order_relaxed);
    finished.store(true, std::memory_order_release);
}

void writer() {
    good = finished.load(std::memory_order_acquire);
    x.store(42, std::memory_order_relaxed);
}

int main() {
    std::thread t1(reader);
    std::thread t2(writer);
    t1.join();
    t2.join();
    if (good) {
        std::cout << xval << std::endl;
    } else {
        std::cout << "too soon" << std::endl;
    }
    return 0;
}

試試神器

這個程序沒有 UB 並且必須打印0或too soon 。 如果 42 到x的writer器存儲可以在加載finished之前重新排序，那么有可能x的reader加載返回 42 並且finished的writer器加載返回true ，在這種情況下程序將不正確地打印42 。

第二季度

是的，編譯器可以刪除其值從未使用過的原子加載，因為符合標准的程序無法檢測加載是否完成。 但是，當前的編譯器通常不會進行此類優化。 部分出於謹慎考慮，因為原子優化很難做到正確，並且錯誤可能非常微妙。 它也可能部分支持程序員編寫依賴於實現的代碼，即能夠通過非標准特性檢測加載是否完成。

第三季度

是的，這種重新排序是完全合法的，現實世界的架構會這樣做。 獲取障礙只是一種方式。

第四季度

是的，這也是合法的。 如果a,b不是原子的，並且某個其他線程正在同時讀取它們，則代碼存在數據競爭並且是 UB，因此如果其他線程觀察到寫入發生的順序錯誤（或召喚鼻惡魔）也沒關系）。 （如果它們是原子的並且你正在做輕松的存儲，那么你不會得到鼻惡魔，但你仍然可以觀察到無序的存儲；沒有發生相反的關系。）

優化對比

您的f與g示例並不是真正公平的比較：在g中，非原子變量x的負載沒有副作用，並且未使用其值，因此編譯器完全省略了它。 如上所述，編譯器不會忽略f中x的不必要的原子負載。

至於為什么編譯器不會在獲取負載之后對a和b的第一次訪問下沉：我相信這只是一個錯過的優化。 同樣，大多數編譯器故意不嘗試使用原子進行所有可能的合法優化。 但是，您可能會注意到，例如在 ARM64 上， f中x的加載編譯為ldar ，CPU 肯定可以使用早期的普通加載和存儲重新排序

圍繞 C++ 中原子負載存儲的優化

問題描述

1 個解決方案

解決方案1
3 已采納 2022-07-11 22:19:10

第一季度

第二季度

第三季度

第四季度

優化對比

圍繞 C++ 中原子負載存儲的優化

問題描述

1 個解決方案

解決方案1 3 已采納 2022-07-11 22:19:10

第一季度

第二季度

第三季度

第四季度

優化對比

解決方案1
3 已采納 2022-07-11 22:19:10