简体   繁体   English

关于IORef操作在并发程序中重新排序的推理

[英]Reasoning about IORef operation reordering in concurrent programs

The docs say: 文档说:

In a concurrent program, IORef operations may appear out-of-order to another thread, depending on the memory model of the underlying processor architecture...The implementation is required to ensure that reordering of memory operations cannot cause type-correct code to go wrong. 在并发程序中,IORef操作可能无序地出现在另一个线程上,具体取决于底层处理器体系结构的内存模型......需要实现以确保重新排序内存操作不会导致类型正确的代码进入错误。 In particular, when inspecting the value read from an IORef, the memory writes that created that value must have occurred from the point of view of the current thread. 特别是,当检查从IORef读取的值时,内存写入创建该值必须从当前线程的角度发生。

Which I'm not even entirely sure how to parse. 我甚至不完全确定如何解析。 Edward Yang says 爱德华杨

In other words, “We give no guarantees about reordering, except that you will not have any type-safety violations.” ... the last sentence remarks that an IORef is not allowed to point to uninitialized memory 换句话说,“我们不保证重新排序,除了你不会有任何类型安全违规。”......最后一句话说明IORef不允许指向未初始化的内存

So... it won't break the whole haskell; 所以...它不会破坏整个哈希尔; not very helpful. 不是很有帮助。 The discussion from which the memory model example arose also left me with questions (even Simon Marlow seemed a bit surprised). 记忆模型例子的讨论也给我留下了问题(甚至Simon Marlow似乎有些惊讶)。

Things that seem clear to me from the documentation 从文档中我可以清楚地看到的事情

  1. within a thread an atomicModifyIORef "is never observed to take place ahead of any earlier IORef operations, or after any later IORef operations" ie we get a partial ordering of: stuff above the atomic mod -> atomic mod -> stuff after. 在一个线程中,一个atomicModifyIORef “从未被观察到在任何早期的IORef操作之前,或者在任何后来的IORef操作之后”,即我们获得了一个部分排序:在原子模块之上的东西 - >原子模型 - >之后的东西。 Although, the wording "is never observed" here is suggestive of spooky behavior that I haven't anticipated. 虽然,这里的措辞“从未被观察到”,但暗示了我没有预料到的怪异行为。

  2. A readIORef x might be moved before writeIORef y , at least when there are no data dependencies 一个readIORef x之前可能被移动writeIORef y ,至少当没有数据依赖

  3. Logically I don't see how something like readIORef x >>= writeIORef y could be reordered 从逻辑上讲,我没有看到像readIORef x >>= writeIORef y这样的东西可以重新排序

What isn't clear to me 有什么不清楚的

  • Will newIORef False >>= \\v-> writeIORef v True >> readIORef v always return True ? 请问newIORef False >>= \\v-> writeIORef v True >> readIORef v总是返回True吗?

  • In the maybePrint case (from the IORef docs) would a readIORef myRef (along with maybe a seq or something) before readIORef yourRef have forced a barrier to reordering? maybePrint案例(来自IORef文档)中, readIORef myRef (以及可能是seq或者其他东西)会在readIORef yourRef之前强制重新排序吗?

What's the straightforward mental model I should have? 我应该有什么直截了当的心理模型? Is it something like: 是这样的:

within and from the point of view of an individual thread, the ordering of IORef operations will appear sane and sequential; 从单个线程的角度来看,IORef操作的顺序将显得健全和顺序; but the compiler may actually reorder operations in such a way that break certain assumptions in a concurrent system; 但是编译器实际上可能以一种在并发系统中破坏某些假设的方式重新排序操作; however when a thread does atomicModifyIORef , no threads will observe operations on that IORef that appeared above the atomicModifyIORef to happen after, and vice versa. 但是当一个线程执行atomicModifyIORef ,没有线程会观察在IORef之上出现的atomicModifyIORef上的操作,反之亦然。

...? ...? If not, what's the corrected version of the above? 如果没有,上面的更正版本是什么?

If your response is "don't use IORef in concurrent code; use TVar " please convince me with specific facts and concrete examples of the kind of things you can't reason about with IORef . 如果您的回答是“不要在并发代码中使用IORef ;请使用TVar ”,请说明具体事实以及您无法IORef推理的具体事例。

I don't know Haskell concurrency, but I know something about memory models. 我不知道Haskell的并发性,但我对内存模型有所了解。

Processors can reorder instructions the way they like: loads may go ahead of loads, loads may go ahead of stores, loads of dependent stuff may go ahead of loads of stuff they depend on (a[i] may load the value from array first, then the reference to array a!), stores may be reordered with each other. 处理器可以按照自己喜欢的方式对指令进行重新排序:负载可能超前于负载,负载可能超过存储,负载的负载可能会超过它们所依赖的负载(a [i]可能首先从数组加载值,然后对数组a!)的引用,存储可以相互重新排序。 You simply cannot put a finger on it and say "these two things definitely appear in a particular order, because there is no way they can be reordered". 你根本无法用手指说“这两件事肯定以特定顺序出现,因为它们无法重新排序”。 But in order for concurrent algorithms to operate, they need to observe the state of other threads. 但是为了使并发算法运行,他们需要观察其他线程的状态。 This is where it is important for thread state to proceed in a particular order. 这是线程状态以特定顺序进行的重要位置。 This is achieved by placing barriers between instructions, which guarantee the order of instructions to appear the same to all processors. 这是通过在指令之间放置障碍来实现的,这保证了指令的顺序对所有处理器显得相同。

Typically (one of the simplest models), you want two types of ordered instructions: ordered load that does not go ahead of any other ordered loads or stores, and ordered store that does not go ahead of any instructions at all, and a guarantee that all ordered instructions appear in the same order to all processors. 通常(最简单的模型之一),您需要两种类型的有序指令:不超过任何其他有序加载或存储的有序加载,以及根本不超出任何指令的有序存储,并保证所有有序指令以相同的顺序出现在所有处理器上。 This way you can reason about IRIW kind of problem: 这样你就可以解释IRIW的那种问题:

Thread 1: x=1

Thread 2: y=1

Thread 3: r1=x;
          r2=y;

Thread 4: r4=y;
          r3=x;

If all of these operations are ordered loads and ordered stores, then you can conclude the outcome (1,0,0,1)=(r1,r2,r3,r4) is not possible. 如果所有这些操作都是有序加载和有序存储,那么您可以得出结果(1,0,0,1)=(r1,r2,r3,r4)是不可能的。 Indeed, ordered stores in Threads 1 and 2 should appear in some order to all threads, and r1=1,r2=0 is witness that y=1 is executed after x=1. 实际上,线程1和2中的有序存储应按某种顺序出现在所有线程中,并且r1 = 1,r2 = 0证明在x = 1之后执行y = 1。 In its turn, this means that Thread 4 can never observe r4=1 without observing r3=1 (which is executed after r4=1) (if the ordered stores happen to be executed that way, observing y==1 implies x==1). 反过来,这意味着线程4永远不会观察到r4 = 1而没有观察到r3 = 1(这是在r4 = 1之后执行)(如果有序存储碰巧以这种方式执行,则观察y == 1意味着x == 1)。

Also, if the loads and stores were not ordered, the processors would usually be allowed to observe the assignments to appear even in different orders: one might see x=1 appear before y=1, the other might see y=1 appear before x=1, so any combination of values r1,r2,r3,r4 is permitted. 此外,如果没有订购加载和存储,通常会允许处理器观察分配甚至以不同的顺序出现:一个可能看到x = 1出现在y = 1之前,另一个可能看到y = 1出现在x之前= 1,因此允许值r1,r2,r3,r4的任何组合。

This is sufficiently implemented like so: 这样就足够了:

ordered load: 有序负载:

load x
load-load  -- barriers stopping other loads to go ahead of preceding loads
load-store -- no one is allowed to go ahead of ordered load

ordered store: 有序商店:

load-store
store-store -- ordered store must appear after all stores
            -- preceding it in program order - serialize all stores
            -- (flush write buffers)
store x,v
store-load -- ordered loads must not go ahead of ordered store
           -- preceding them in program order

Of these two, I can see IORef implements a ordered store ( atomicWriteIORef ), but I don't see a ordered load ( atomicReadIORef ), without which you cannot reason about IRIW problem above. 在这两个中,我可以看到IORef实现了一个有序的存储( atomicWriteIORef ),但是我没有看到有序的加载( atomicReadIORef ),没有它就无法解释上面的IRIW问题。 This is not a problem, if your target platform is x86, because all loads will be executed in program order on that platform, and stores never go ahead of loads (in effect, all loads are ordered loads). 如果您的目标平台是x86,这不是问题,因为所有负载将在该平台上按程序顺序执行,并且存储永远不会超过负载(实际上,所有负载都是有序负载)。

A atomic update ( atomicModifyIORef ) seems to me a implementation of a so-called CAS loop (compare-and-set loop, which does not stop until a value is atomically set to b, if its value is a). 原子更新( atomicModifyIORef )在我看来是一个所谓的CAS循环的实现(比较和设置循环,它不会停止,直到一个值原子设置为b,如果它的值是a)。 You can see the atomic modify operation as a fusion of a ordered load and ordered store, with all those barriers there, and executed atomically - no processor is allowed to insert a modification instruction between load and store of a CAS. 您可以将原子修改操作视为有序加载和有序存储的融合,具有所有这些障碍,并以原子方式执行 - 不允许处理器在CAS的加载和存储之间插入修改指令。


Furthermore, writeIORef is cheaper than atomicWriteIORef, so you want to use writeIORef as much as your inter-thread communication protocol permits. 此外,writeIORef比atomicWriteIORef便宜,因此您希望使用writeIORef,就像您的线程间通信协议所允许的那样。 Whereas writeIORef x vx >> writeIORef y vy >> atomicWriteIORef z vz >> readIORef t does not guarantee the order in which writeIORefs appear to other threads with respect to each other, there is a guarantee that they both will appear before atomicWriteIORef - so, seeing z==vz, you can conclude at this moment x==vx and y==vy, and you can conclude IORef t was loaded after stores to x, y, z can be observed by other threads. writeIORef x vx >> writeIORef y vy >> atomicWriteIORef z vz >> readIORef t不保证writeIORefs相对于彼此出现在其他线程中的顺序,保证它们都会出现在atomicWriteIORef之前 - 所以,看到z == vz,你可以在此时得出结论x == vx和y == vy,你可以得出结论, 存储到x,y,z之后,其他线程可以观察到IORef t被加载。 This latter point requires readIORef to be a ordered load, which is not provided as far as I can tell, but it will work like a ordered load on x86. 后一点要求readIORef是一个有序负载,据我所知,它没有提供,但它将像x86上的有序负载一样工作。

Typically you don't use concrete values of x, y, z, when reasoning about the algorithm. 通常,在推理算法时,不要使用x,y,z的具体值。 Instead, some algorithm-dependent invariants about the assigned values must hold, and can be proven - for example, like in IRIW case you can guarantee that Thread 4 will never see (0,1)=(r3,r4) , if Thread 3 sees (1,0)=(r1,r2) , and Thread 3 can take advantage of this: this means something is mutually excluded without acquiring any mutex or lock. 相反,关于指定值的一些依赖于算法的不变量必须保持,并且可以证明 - 例如,在IRIW情况下,您可以保证线程4永远不会看到(0,1)=(r3,r4) ,如果线程3看到(1,0)=(r1,r2) ,并且线程3可以利用这个:这意味着在不获取任何互斥锁或锁的情况下相互排除某些东西。


An example (not in Haskell) that will not work if loads are not ordered, or ordered stores do not flush write buffers (the requirement to make written values visible before the ordered load executes). 一个示例(不在Haskell中)如果没有订购负载就无法工作,或者有序存储不会刷新写缓冲区(在有序加载执行之前要求写入值可见)。

Suppose, z will show either x until y is computed, or y, if x has been computed, too. 假设,z将显示x直到y被计算,或y,如果x已被计算。 Don't ask why, it is not very easy to see outside the context - it is a kind of a queue - just enjoy what sort of reasoning is possible. 不要问为什么,在上下文之外看到它并不容易 - 它是一种排队 - 只是享受可能的推理。

Thread 1: x=1;
          if (z==0) compareAndSet(z, 0, y == 0? x: y);

Thread 2: y=2;
          if (x != 0) while ((tmp=z) != y && !compareAndSet(z, tmp, y));

So, two threads set x and y, then set z to x or y, depending on whether y or x were computed, too. 因此,两个线程设置x和y,然后将z设置为x或y,具体取决于是否计算y或x。 Assuming initially all are 0. Translating into loads and stores: 假设最初都是0.转换为加载和存储:

Thread 1: store x,1
          load z
          if ==0 then
            load y
            if == 0 then load x -- if loaded y is still 0, load x into tmp
            else load y -- otherwise, load y into tmp
            CAS z, 0, tmp -- CAS whatever was loaded in the previous if-statement
                          -- the CAS may fail, but see explanation

Thread 2: store y,2
          load x
          if !=0 then
          loop: load z -- into tmp
                load y
                if !=tmp then -- compare loaded y to tmp
                  CAS z, tmp, y  -- attempt to CAS z: if it is still tmp, set to y
                  if ! then goto loop -- if CAS did not succeed, go to loop

If Thread 1 load z is not a ordered load, then it will be allowed to go ahead of a ordered store ( store x ). 如果线程1 load z不是有序加载,则允许它超过有序存储( store x )。 It means wherever z is loaded to (a register, cache line, stack,...), the value is such that existed before the value of x can be visible. 这意味着无论z加载到何处(寄存器,缓存行,堆栈......),该值都是在x的值可见之前存在的值。 Looking at that value is useless - you cannot then judge where Thread 2 is up to. 查看该值是没用的 - 您无法判断线程2的位置。 For the same reason you've got to have a guarantee that the write buffers were flushed before load z executed - otherwise it will still appear as a load of a value that existed before Thread 2 could see the value of x. 出于同样的原因,你必须保证在执行load z之前刷新写缓冲区 - 否则它仍将显示为在线程2看到x的值之前存在的值的加载。 This is important as will become clear below. 这很重要,如下所述。

If Thread 2 load x or load z are not ordered loads, they may go ahead of store y , and will observe the values that were written before y is visible to other threads. 如果线程2 load xload z不是有序加载,它们可能会超过store y ,并将观察在y对其他线程可见之前写入的值。

However, see that if the loads and stores are ordered, then the threads can negotiate who is to set the value of z without contending z. 但是,请参阅如果订购了加载和存储,那么线程可以协商谁来设置z的值而不竞争z。 For example, if Thread 2 observes x==0, there is guarantee that Thread 1 will definitely execute x=1 later, and will see z==0 after that - so Thread 2 can leave without attempting to set z. 例如,如果线程2观察到x == 0,则保证线程1肯定会在稍后执行x = 1,并且在此之后将看到z == 0 - 因此线程2可以离开而不尝试设置z。

If Thread 1 observes z==0, then it should try to set z to x or y. 如果线程1观察到z == 0,那么它应该尝试将z设置为x或y。 So, first it will check if y has been set already. 所以,首先它将检查是否已经设置了y。 If it wasn't set, it will be set in the future, so try to set to x - CAS may fail, but only if Thread 2 concurrently set z to y, so no need to retry. 如果未设置,将在未来设置,因此尝试设置为x - CAS可能会失败,但仅当线程2同时将z设置为y时,因此无需重试。 Similarly there is no need to retry if Thread 1 observed y has been set: if CAS fails, then it has been set by Thread 2 to y. 类似地,如果已经设置了线程1观察到y,则无需重试:如果CAS失败,则线程2将其设置为y。 Thus we can see Thread 1 sets z to x or y in accordance with the requirement, and does not contend z too much. 因此,我们可以看到线程1根据需求将z设置为x或y,并且不会过多地竞争z。

On the other hand, Thread 2 can check if x has been computed already. 另一方面,线程2可以检查是否已经计算了x。 If not, then it will be Thread 1's job to set z. 如果没有,那么设置z将是线程1的工作。 If Thread 1 has computed x, then need to set z to y. 如果线程1计算了x,则需要将z设置为y。 Here we do need a CAS loop, because a single CAS may fail, if Thread 1 is attempting to set z to x or y concurrently. 这里我们需要CAS循环,因为如果线程1试图将z同时设置为x或y,则单个CAS可能会失败。

The important takeaway here is that if "unrelated" loads and stores are not serialized (including flushing write buffers), no such reasoning is possible. 这里重要的一点是,如果“无关”的加载和存储没有被序列化(包括刷新写缓冲区),那么就不可能有这样的推理。 However, once loads and stores are ordered, both threads can figure out the path each of them _will_take_in_the_future, and that way eliminate contention in half the cases. 但是,一旦订购了加载和存储,两个线程都可以找出每个线程_will_take_in_the_future的路径,这样就可以在一半的情况下消除争用。 Most of the time x and y will be computed at significantly different times, so if y is computed before x, it is likely Thread 2 will not touch z at all. 大多数时候x和y将在显着不同的时间计算,因此如果y在x之前计算,则很可能线程2根本不会触及z。 (Typically, "touching z" also possibly means "wake up a thread waiting on a cond_var z", so it is not only a matter of loading something from memory) (通常,“触摸z”也可能意味着“唤醒等待cond_var z的线程”,因此它不仅仅是从内存中加载某些内容的问题)

  1. within a thread an atomicModifyIORef "is never observed to take place ahead of any earlier IORef operations, or after any later IORef operations" ie we get a partial ordering of: stuff above the atomic mod -> atomic mod -> stuff after. 在一个线程中,一个atomicModifyIORef“从未被观察到在任何早期的IORef操作之前,或者在任何后来的IORef操作之后”,即我们获得了一个部分排序:在原子模块之上的东西 - >原子模型 - >之后的东西。 Although, the wording "is never observed" here is suggestive of spooky behavior that I haven't anticipated. 虽然,这里的措辞“从未被观察到”,但暗示了我没有预料到的怪异行为。

"is never observed" is standard language when discussing memory reordering issues. “从未观察到”是讨论内存重新排序问题时的标准语言。 For example, a CPU may issue a speculative read of a memory location earlier than necessary, so long as the value doesn't change between when the read is executed (early) and when the read should have been executed (in program order). 例如,CPU可以在必要之前发出对存储器位置的推测性读取,只要该值在执行读取(早期)和应该执行读取之间(按程序顺序)之间不改变。 That's entirely up to the CPU and cache though, it's never exposed to the programmer (hence language like "is never observed"). 这完全取决于CPU和缓存,但它从未暴露给程序员(因此语言就像“永远不会被观察到”)。

  1. A readIORef x might be moved before writeIORef y, at least when there are no data dependencies 可以在writeIORef y之前移动readIORef x,至少在没有数据依赖性时

True 真正

  1. Logically I don't see how something like readIORef x >>= writeIORef y could be reordered 从逻辑上讲,我没有看到像readIORef x >> = writeIORef y这样的东西可以重新排序

Correct, as that sequence has a data dependency. 正确,因为该序列具有数据依赖性。 The value to be written depends upon the value returned from the first read. 要写入的值取决于第一次读取时返回的值。

For the other questions: newIORef False >>= \\v-> writeIORef v True >> readIORef v will always return True (there's no opportunity for other threads to access the ref here). 对于其他问题: newIORef False >>= \\v-> writeIORef v True >> readIORef v将始终返回True (此处没有其他线程访问ref的机会)。

In the myprint example, there's very little you can do to ensure this works reliably in the face of new optimizations added to future GHCs and across various CPU architectures. myprint示例中,除了添加到未来GHC和各种CPU架构中的新优化之外,您几乎无法确保其可靠地工作。 If you write: 如果你写:

writeIORef myRef True
x <- readIORef myRef
yourVal <- x `seq` readIORef yourRef

Even though GHC 7.6.3 produces correct cmm (and presumably asm, although I didn't check), there's nothing to stop a CPU with a relaxed memory model from moving the readIORef yourRef to before all of the myref/seq stuff. 即使GHC 7.6.3产生正确的cmm(并且可能是asm,虽然我没有检查),但是没有什么可以阻止具有宽松内存模型的CPU将readIORef yourRef移动到所有myref/seq之前。 The only 100% reliable way to prevent it is with a memory fence, which GHC doesn't provide. 防止它的唯一100%可靠的方法是使用GHC不提供的内存栅栏。 (Edward's blog post does go through some of the other things you can do now, as well as why you may not want to rely on them). (爱德华的博客文章确实介绍了你现在可以做的其他一些事情,以及为什么你可能不想依赖它们)。

I think your mental model is correct, however it's important to know that the possible apparent reorderings introduced by concurrent ops can be really unintuitive. 我认为你的心智模型是正确的,但重要的是要知道并发操作引入的可能明显的重新排序可能真的不直观。

Edit: at the cmm level, the code snippet above looks like this (simplified, pseudocode): 编辑:在cmm级别,上面的代码片段看起来像这样(简化,伪代码):

[StackPtr+offset] := True
x := [StackPtr+offset]
if (notEvaluated x) (evaluate x)
yourVal := [StackPtr+offset2]

So there are a couple things that can happen. 所以有一些事情可能发生。 GHC as it currently stands is unlikely to move the last line any earlier, but I think it could if doing so seemed more optimal. 目前的GHC不太可能在最后一行上移动,但我认为如果这样做似乎更合理。 I'm more concerned that, if you compile via LLVM, the LLVM optimizer might replace the second line with the value that was just written, and then the third line might be constant-folded out of existence, which would make it more likely that the read could be moved earlier. 我更关心的是,如果你通过LLVM进行编译,LLVM优化器可能会用刚刚写入的值替换第二行,然后第三行可能会不断折叠,这样就更有可能读取可以提前移动。 And regardless of what GHC does, most CPU memory models allow the CPU itself to move the read earlier absent a memory barrier. 无论GHC做什么,大多数CPU内存模型都允许CPU本身在没有内存屏障的情况下提前移动读取。

http://en.wikipedia.org/wiki/Memory_ordering for non atomic concurrent reads and writes. http://en.wikipedia.org/wiki/Memory_ordering用于非原子并发读写。 (basically when you dont use atomics, just look at the memory ordering model for your target CPU) (基本上当你不使用原子时,只需查看目标CPU的内存排序模型)

Currently ghc can be regarded as not reordering your reads and writes for non atomic (and imperative) loads and stores. 目前ghc可以被视为重新排序非原子(和命令性)加载和存储的读写。 However, GHC Haskell currently doesn't specify any sort of concurrent memory model, so those non atomic operations will have the ordering semantics of the underlying CPU model, as I link to above. 但是,GHC Haskell目前没有指定任何类型的并发内存模型,因此这些非原子操作将具有底层CPU模型的排序语义,如上所述。

In other words, Currently GHC has no formal concurrency memory model, and because any optimization algorithms tend to be wrt some model of equivalence, theres no reordering currently in play there. 换句话说,目前GHC 没有正式的并发内存模型,并且因为任何优化算法都倾向于某种等价模型,因此目前没有重新排序。

that is: the only semantic model you can have right now is "the way its implemented" 那就是:你现在唯一可以拥有的语义模型是“它的实现方式”

shoot me an email! 给我发电子邮件! I'm working on some patching up atomics for 7.10, lets try to cook up some semantics! 我正在为7.10修补原子,让我们尝试烹饪一些语义!

Edit: some folks who understand this problem better than me chimed in on ghc-users thread here http://www.haskell.org/pipermail/glasgow-haskell-users/2013-December/024473.html . 编辑:一些比我更了解这个问题的人在http://www.hskell.org/pipermail/glasgow-haskell-users/2013-December/024473.html上找到ghc-users帖子。 Assume that i'm wrong in both this comment and anything i said in the ghc-users thread :) 假设我在这个评论和我在ghc-users线程中说的任何内容都错了:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM