简体   繁体   中英

Will a non-atomic load to the same cache line as an atomic variable cause the atomic variable to fail?

Given something like this on an ARMv8 CPU (though this may apply to many others as well):

class abcxzy 
{
  // Pragma align to cacheline to ensure they exist on same line.
  unit32_t atomic_data;
  uint32_t data;

  void foo()
  {
    volatile asm (
      "   ldr w0, [address of data]\n"
      "# Do stuff with data in w0..."
      "   str w0, [address of data]\n"

      "1: ldaxr w0, [address of atomic_data]\n"
      "   add w1, w0, #0x1\n"
      "   stxr w2,w1, [address of atomic_data]\n"
      "   cbnz w2, 1b\n"
    );
  }
}

With proper clobbers and such set on the Asm inline so that C and Asm can coexist happily in a world of rainbow ponies and sunshine.

In a multiple CPU situation, all running this code at the same time, will the stores to data cause the atomic load/store to atomic_data to fail? From what I've read, the ARM atomic stuff works on a cache line basis, but it is not clear if the non-atomic store will affect the atomic. I hope that it it doesn't (and assume that it does...), but I am looking to see if anyone else can confirm this.

Ok, finally found what I needed, though I don't like it:

According to the ARM documentation, It is IMPLEMENTATION DEFINED whether a non-exclusive store to the same cache line as the exclusive store causes the exclusive store to fail. Thanks ARM. Appreciate that wonderful non-conclusive info.


Edit:

By fail, I mean the stxr command did not write to memory and returned a "1" in the status register. "Your atomic data updated and needs new RMW" status.

To answer other statements:

  • Yes, atomic critical areas should be as small as possible. The docs event give numbers on how small, and they are very reasonable indeed. I hope that my sections never span 1k or more...

  • And yes, any situation where you would need to worry about this kind of contention killing performance or worse means your code is "doing it wrong." The ARM docs are state this in a round about manner :)

  • As to putting the non-atomic loads and stores inside the atomics - my pseudo test above was just demonstrating a random access to the same cache line as an example. In real code, you obviously should avoid this. I was just trying to get a feeling for how "bad" it might be if, perhaps a high speed hardware timer store was hitting the same cache line as a lock. Again, don't do this...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM