ARM 上的無鎖 SPSC 隊列實現

Question

我正在嘗試為 ARM 編寫一個生產者單個消費者隊列，我想我已經接近於 DMB 了，但需要一些檢查（我更熟悉 std::atomic。）

這是我所在的位置：

bool push(const_reference value)
{
    // Check for room
    const size_type currentTail = tail;
    const size_type nextTail = increment(currentTail);
    if (nextTail == head)
        return false;

    // Write the value
    valueArr[currentTail] = value;

    // Prevent the consumer from seeing the incremented tail before the
    // value is written.
    __DMB();

    // Increment tail
    tail = nextTail;

    return true;
}

bool pop(reference valueLocation)
{
    // Check for data
    const size_type currentHead = head;
    if (currentHead == tail)
        return false;

    // Write the value.
    valueLocation = valueArr[currentHead];

    // Prevent the producer from seeing the incremented head before the
    // value is written.
    __DMB();

    // Increment the head
    head = increment(head);

    return true;
}

我的問題是：我的 DMB 位置和理由是否准確？ 還是仍然理解我失蹤了？ 在處理由另一個線程（或中斷）更新的變量時，我特別不確定條件是否需要一些保護。

Answer 1

那里的障礙是必要的但還不夠，您還需要“獲取”語義來加載由其他線程修改的 var。 （或者至少consume ，但是沒有障礙需要 asm 來創建數據依賴項。編譯器在已經擁有控制依賴項之后不會這樣做。）
單核系統可以只使用編譯器屏障，例如 GNU C asm("":::"memory")或std::atomic_signal_fence(std::memory_order_release) ，而不是dmb 。 制作一個宏，以便您可以在 SMP 安全屏障或 UP（單處理器）屏障之間進行選擇。
head = increment(head); 是head的無意義的重新加載，使用本地副本。
使用std::atomic可移植地獲取必要的代碼生成。

您通常不需要滾動自己的原子； ARM 的現代編譯器確實實現了std::atomic<T> 。 但是據我所知，沒有std::atomic<>實現知道單核系統以避免實際障礙並且只是安全的。 可能導致上下文切換的中斷。

在單核系統上，您不需要dsb ，只需要一個編譯器屏障。 CPU 將保留 asm 指令按程序順序執行的錯覺。 您只需要確保編譯器生成以正確順序執行操作的 asm。 您可以通過使用std::atomic和std::memory_order_relaxed以及手動atomic_signal_fence(memory_order_acquire)或release障礙來做到這一點。 （不是atomic_thread_fence ；會發出 asm 指令，通常是dsb ）。

每個線程讀取另一個線程修改的變量。 通過確保它們僅在訪問數組后可見，您正確地進行了修改發布存儲。

但是這些讀取也需要獲取加載才能與那些發布存儲同步。 例如，確保push沒有寫入valueArr[currentTail] = value; 在pop完成讀取相同的元素之前。 或者在完整寫入之前閱讀條目。

如果沒有任何障礙，失敗模式將是if (currentHead == tail) return false; 實際上並沒有從 memory 直到tail valueLocation = valueArr[currentHead]; 發生。 運行時負載重新排序可以在弱排序 ARM 上輕松完成。 如果加載地址對tail有數據依賴，那么可以避免在 SMP 系統上需要屏障（ARM 保證 asm 中的依賴排序； mo_consume應該公開的特性）。 但是如果編譯器只是發出一個分支，那只是一個控制依賴，而不是數據。 如果您在 asm 中手動編寫，我認為比較設置的標志上的ldrne r0, [r1, r2]之類的謂詞加載會創建數據依賴關系。

編譯時重新排序不太合理，但是如果它只是阻止編譯器做一些它無論如何都不會做的事情，那么一個僅編譯器的障礙是免費的。

未經測試的實現，編譯為看起來不錯但沒有其他測試的 asm

為push做類似的事情。 我包含了用於加載獲取/存儲釋放和 fullbarrier() 的包裝函數。 （相當於 Linux 內核的smp_mb()宏，定義為編譯時或編譯+運行時屏障。）

#include <atomic>

#define UNIPROCESSOR


#ifdef UNIPROCESSOR
#define fullbarrier()  asm("":::"memory")   // GNU C compiler barrier
                          // atomic_signal_fence(std::memory_order_seq_cst)
#else
#define fullbarrier() __DMB()    // or atomic_thread_fence(std::memory_order_seq_cst)
#endif

template <class T>
T load_acquire(std::atomic<T> &x) {
#ifdef UNIPROCESSOR
    T tmp = x.load(std::memory_order_relaxed);
    std::atomic_signal_fence(std::memory_order_acquire);
    // or fullbarrier();  if you want to use that macro
    return tmp;
#else
    return x.load(std::memory_order_acquire);
    // fullbarrier() / __DMB();
#endif
}

template <class T>
void store_release(std::atomic<T> &x, T val) {
#ifdef UNIPROCESSOR
    std::atomic_signal_fence(std::memory_order_release);
    // or fullbarrier();
    x.store(val, std::memory_order_relaxed);
#else
    // fullbarrier() / __DMB(); before plain store
    return x.store(val, std::memory_order_release);
#endif
}

template <class T>
struct SPSC_queue {
  using size_type = unsigned;
  using value_type = T;
  static const size_type size = 1024;

  std::atomic<size_type> head;
  value_type valueArr[size];
  std::atomic<size_type> tail;  // in a separate cache-line from head to reduce contention

  bool push(const value_type &value)
  {
    // Check for room
    const size_type currentTail = tail.load(std::memory_order_relaxed);  // no other writers to tail, no ordering needed
    const size_type nextTail = currentTail + 1;    // modulo separately so empty and full are distinguishable.
    if (nextTail == load_acquire(head))
        return false;

    valueArr[currentTail % size] = value;
    store_release(tail, nextTail);
    return true;
  }
};

// instantiate the template for  int  so we can look at the asm
template bool SPSC_queue<int>::push(const value_type &value);

如果您使用-DUNIPROCESSOR -DUNIPROCESSOR g++9.2 -O3 -mcpu=cortex-a15 （只是為了選擇一個隨機的現代風格 ARM 內核，那么 GCC 可以在std::atomic中加載存儲 function 和非單處理器情況的屏障。

ARM 上的無鎖 SPSC 隊列實現

問題描述

1 個解決方案

解決方案1
2 已采納 2020-05-30 01:47:25

未經測試的實現，編譯為看起來不錯但沒有其他測試的 asm

ARM 上的無鎖 SPSC 隊列實現

問題描述

1 個解決方案

解決方案1 2 已采納 2020-05-30 01:47:25

未經測試的實現，編譯為看起來不錯但沒有其他測試的 asm

解決方案1
2 已采納 2020-05-30 01:47:25