简体   繁体   English

没有std :: atomics的无锁哈希是否保证在C ++ 11中是线程安全的?

[英]Is lockless hashing without std::atomics guaranteed to be thread-safe in C++11?

Consider the following attempt at a lockless hashtable for multithreaded search algorithms (inspired by this paper ) 考虑以下针对多线程搜索算法的无锁散列表的尝试(受本文启发)

struct Data
{
    uint64_t key;
    uint64_t value;
};

struct HashEntry
{
    uint64_t key_xor_value;
    uint64_t value;
};

void insert_data(Data const& e, HashEntry* h, std::size_t tableOffset)
{
    h[tableOffset].key_xor_value = e.key ^ e.value;
    h[tableOffset].value = e.value;
}

bool data_is_present(Data const& e, HashEntry const* h, std::size_t tableOffset)
{
    auto const tmp_key_xor_value = h[tableOFfset].key_xor_value;
    auto const tmp_value = h[tableOffset].value;

    return e.key == (tmp_key_xor_value ^ tmp_value);
}

The idea is that a HashEntry struct stores the XOR-ed combination of the two 64-bit words of a Data struct. 我们的想法是HashEntry结构存储Data结构的两个64位字的XOR-ed组合。 If two threads have interleaved reads/writes to the two 64-bit words of a HashEntry struct, the idea is that this can be detected by the reading thread by XOR-ing again and comparing against the original key . 如果两个线程对HashEntry结构的两个64位字进行交错读/写操作,那么想法是读取线程可以通过再次进行异或并与原始key进行比较来检测。 So one might have a loss of efficiency by corrupted hash entries, but still have guaranteed correctness in case the decoded retrieved key matches the original. 因此,可能由于损坏的哈希条目而导致效率损失,但是在解码的检索到的密钥与原始密钥匹配的情况下仍然保证了正确性。

The paper mentions that it is based on the following assumption: 该文件提到它基于以下假设:

For the remainder of this discussion, assume that 64 bit memory read/write operations are atomic, that is the entire 64 bit value is read/written in one cycle. 对于本讨论的其余部分,假设64位存储器读/写操作是原子的,即在一个周期内读/写整个64位值。

My questions are: is the above code without explicit use of std::atomic<uint64_t> guaranteed to be thread-safe in C++11? 我的问题是:上面的代码没有明确使用std::atomic<uint64_t>保证在C ++ 11中是线程安全的吗? Or can the individual 64-bit words be corrupted by simultaneous reads/writes? 或者可以通过同时读/写来破坏各个64位字? Even on 64-bit platforms? 即使在64位平台上? And how is this different from the old C++98 Standard? 这与旧的C ++ 98标准有何不同?

Quotes from the Standard would be much appreciated. 标准的行情将非常感谢。

UPDATE : based on this amazing paper by Hans Boehm on "benign" data races , a simple way to get bitten is for the compiler to cancel both XORs from insert_data() and data_is_present() to alway return true , eg if it finds a local code fragment like 更新 :基于Hans Boehm关于“良性”数据竞赛的这篇惊人论文,一个简单的方法就是让编译器从insert_data()data_is_present()取消两个XOR, data_is_present()返回true ,例如,如果它找到了本地代码片段就像

insert_data(e, h, t);
if (data_is_present(e, h, t)) // optimized to true as if in single-threaded code
   read_and_process(e, h, t); // data race if other thread has written

The C++11 specification defines pretty much any attempt by one thread to read or write a memory location that another thread is writing to as undefined behavior (absent the use of atomics or mutexes to prevent read/writes from one thread while another thread is writing). C ++ 11规范几乎定义了一个线程读取或写入另一个线程正在写入的内存位置的任何尝试作为未定义的行为(没有使用原子或互斥锁来防止来自一个线程的读/写而另一个线程是写作)。

Individual compilers may make it safe, but the C++11 specification doesn't provide coverage itself. 个别编译器可能会使其安全,但C ++ 11规范本身并不提供覆盖。 Simultaneous reads are never a problem; 同时读取从来都不是问题; it's writing in one thread while reading/writing in another. 它在一个线程中写入,而在另一个线程中读取/写入。

And how is this different from the old C++98 Standard? 这与旧的C ++ 98标准有何不同?

The C++98/03 standard doesn't provide any coverage with regard to threading. C ++ 98/03标准没有提供有关线程的任何内容。 As far as the C++98/03 memory model is concerned, threading is not a thing that can possibly happen . 就C ++ 98/03内存模型而言, 线程不是可能发生的事情

I dont think it depends so much on the compiler as on the CPU (its instruction set) you are using. 我不认为它在很大程度上取决于你正在使用的CPU(它的指令集)上的编译器。 I wouldnt think the assumption would be very portable. 我不认为这个假设是非常便携的。

The code's totally broken. 代码完全坏了。 The compiler's has substantial freedom to reorder instructions if its analysis suggests the overall effect is identical. 如果编译器的分析表明整体效果相同,则编译器可以自由地重新排序指令。 In insert_data for example, there's no guarantee that key_xor_value will be updated before the value , whether the updates are done on temporary registers before being written back into the cache, let alone when those cache updates - whatever their "order" in the machine code language and CPU instruction execution pipeline - will be flushed from the updating core's or cores' (if context-switched mid-function) private caches to become visible to other threads. 例如,在insert_data中,无法保证key_xor_value将在value之前更新,无论更新是在临时寄存器上完成还是在写回缓存之前,更不用说那些缓存更新 - 无论机器代码语言中的“顺序”如何和CPU指令执行管道 - 将从更新核心或核心(如果上下文切换的中间函数)私有缓存中刷新,以便其他线程可见。 The compiler might even do the updates in steps using 32 bit registers, depending on the CPU, whether compiling 32-bit or 64-bit, compilation options etc.. 编译器甚至可以使用32位寄存器逐步执行更新,具体取决于CPU,是否编译32位或64位,编译选项等。

Atomic operations tend to require something like CAS (Compare and Swap) style instructions, or volatile and memory barrier instructions, that sync data across cores' caches and enforce some ordering. 原子操作往往需要CAS(比较和交换)样式指令,或volatile和内存屏障指令,这些指令可以跨核心的高速缓存同步数据并强制执行某些排序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM