ARM 上的无锁 SPSC 队列实现

Question

我正在尝试为 ARM 编写一个生产者单个消费者队列，我想我已经接近于 DMB 了，但需要一些检查（我更熟悉 std::atomic。）

这是我所在的位置：

bool push(const_reference value)
{
    // Check for room
    const size_type currentTail = tail;
    const size_type nextTail = increment(currentTail);
    if (nextTail == head)
        return false;

    // Write the value
    valueArr[currentTail] = value;

    // Prevent the consumer from seeing the incremented tail before the
    // value is written.
    __DMB();

    // Increment tail
    tail = nextTail;

    return true;
}

bool pop(reference valueLocation)
{
    // Check for data
    const size_type currentHead = head;
    if (currentHead == tail)
        return false;

    // Write the value.
    valueLocation = valueArr[currentHead];

    // Prevent the producer from seeing the incremented head before the
    // value is written.
    __DMB();

    // Increment the head
    head = increment(head);

    return true;
}

我的问题是：我的 DMB 位置和理由是否准确？ 还是仍然理解我失踪了？ 在处理由另一个线程（或中断）更新的变量时，我特别不确定条件是否需要一些保护。

Answer 1

那里的障碍是必要的但还不够，您还需要“获取”语义来加载由其他线程修改的 var。 （或者至少consume ，但是没有障碍需要 asm 来创建数据依赖项。编译器在已经拥有控制依赖项之后不会这样做。）
单核系统可以只使用编译器屏障，例如 GNU C asm("":::"memory")或std::atomic_signal_fence(std::memory_order_release) ，而不是dmb 。 制作一个宏，以便您可以在 SMP 安全屏障或 UP（单处理器）屏障之间进行选择。
head = increment(head); 是head的无意义的重新加载，使用本地副本。
使用std::atomic可移植地获取必要的代码生成。

您通常不需要滚动自己的原子； ARM 的现代编译器确实实现了std::atomic<T> 。 但是据我所知，没有std::atomic<>实现知道单核系统以避免实际障碍并且只是安全的。 可能导致上下文切换的中断。

在单核系统上，您不需要dsb ，只需要一个编译器屏障。 CPU 将保留 asm 指令按程序顺序执行的错觉。 您只需要确保编译器生成以正确顺序执行操作的 asm。 您可以通过使用std::atomic和std::memory_order_relaxed以及手动atomic_signal_fence(memory_order_acquire)或release障碍来做到这一点。 （不是atomic_thread_fence ；会发出 asm 指令，通常是dsb ）。

每个线程读取另一个线程修改的变量。 通过确保它们仅在访问数组后可见，您正确地进行了修改发布存储。

但是这些读取也需要获取加载才能与那些发布存储同步。 例如，确保push没有写入valueArr[currentTail] = value; 在pop完成读取相同的元素之前。 或者在完整写入之前阅读条目。

如果没有任何障碍，失败模式将是if (currentHead == tail) return false; 实际上并没有从 memory 直到tail valueLocation = valueArr[currentHead]; 发生。 运行时负载重新排序可以在弱排序 ARM 上轻松完成。 如果加载地址对tail有数据依赖，那么可以避免在 SMP 系统上需要屏障（ARM 保证 asm 中的依赖排序； mo_consume应该公开的特性）。 但是如果编译器只是发出一个分支，那只是一个控制依赖，而不是数据。 如果您在 asm 中手动编写，我认为比较设置的标志上的ldrne r0, [r1, r2]之类的谓词加载会创建数据依赖关系。

编译时重新排序不太合理，但是如果它只是阻止编译器做一些它无论如何都不会做的事情，那么一个仅编译器的障碍是免费的。

未经测试的实现，编译为看起来不错但没有其他测试的 asm

为push做类似的事情。 我包含了用于加载获取/存储释放和 fullbarrier() 的包装函数。 （相当于 Linux 内核的smp_mb()宏，定义为编译时或编译+运行时屏障。）

#include <atomic>

#define UNIPROCESSOR


#ifdef UNIPROCESSOR
#define fullbarrier()  asm("":::"memory")   // GNU C compiler barrier
                          // atomic_signal_fence(std::memory_order_seq_cst)
#else
#define fullbarrier() __DMB()    // or atomic_thread_fence(std::memory_order_seq_cst)
#endif

template <class T>
T load_acquire(std::atomic<T> &x) {
#ifdef UNIPROCESSOR
    T tmp = x.load(std::memory_order_relaxed);
    std::atomic_signal_fence(std::memory_order_acquire);
    // or fullbarrier();  if you want to use that macro
    return tmp;
#else
    return x.load(std::memory_order_acquire);
    // fullbarrier() / __DMB();
#endif
}

template <class T>
void store_release(std::atomic<T> &x, T val) {
#ifdef UNIPROCESSOR
    std::atomic_signal_fence(std::memory_order_release);
    // or fullbarrier();
    x.store(val, std::memory_order_relaxed);
#else
    // fullbarrier() / __DMB(); before plain store
    return x.store(val, std::memory_order_release);
#endif
}

template <class T>
struct SPSC_queue {
  using size_type = unsigned;
  using value_type = T;
  static const size_type size = 1024;

  std::atomic<size_type> head;
  value_type valueArr[size];
  std::atomic<size_type> tail;  // in a separate cache-line from head to reduce contention

  bool push(const value_type &value)
  {
    // Check for room
    const size_type currentTail = tail.load(std::memory_order_relaxed);  // no other writers to tail, no ordering needed
    const size_type nextTail = currentTail + 1;    // modulo separately so empty and full are distinguishable.
    if (nextTail == load_acquire(head))
        return false;

    valueArr[currentTail % size] = value;
    store_release(tail, nextTail);
    return true;
  }
};

// instantiate the template for  int  so we can look at the asm
template bool SPSC_queue<int>::push(const value_type &value);

如果您使用-DUNIPROCESSOR -DUNIPROCESSOR g++9.2 -O3 -mcpu=cortex-a15 （只是为了选择一个随机的现代风格 ARM 内核，那么 GCC 可以在std::atomic中加载存储 function 和非单处理器情况的屏障。

ARM 上的无锁 SPSC 队列实现

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-05-30 01:47:25

未经测试的实现，编译为看起来不错但没有其他测试的 asm

ARM 上的无锁 SPSC 队列实现

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-05-30 01:47:25

未经测试的实现，编译为看起来不错但没有其他测试的 asm

解决方案1
2 已采纳 2020-05-30 01:47:25