Can memcpy of array of 16-bit objects be interrupted in between

Question

Global data:

uint16_t global_buffer[128];

Thread 1:

uint16_t local_buffer[128];
while(true)
{
    ...
    if(data_ready)
        memcpy(global_buffer, local_buffer, sizeof(uint16_t)*128);
}

Thread 2:

void timer_handler()
{
    uint16_t value = global_buffer[10];
    //do something with value
}

My question is whether this is safe to do? I mean, is it guaranteed that value will either get an old value or a new value (if thread 1 memcpy() is interrupted by context switch)? Is it possible that the memcpy gets interrupted after one byte of the 16-bit value is updated but not the second. In that case, value will be garbage.

If memcpy operation only gets interrupted in between blocks of even number of bytes, I think this is safe.

Platforms: x86 & x86-64 only (only Intel i7 processor or newer actually)
OS: Linux
Compiler: gcc

Answer 1

It would depend on the implementation of memcpy() - there are no guarantees. Even if you know the implementation makes this safe, it would be unwise to rely on it remaining so across all versions and platforms this code or pattern may get re-used on.

You might implement your own word-by-word 16 bit copy with a word copy that you know to be atomic. How to do that warrants a new question.

Answer 2

Interrupts aren't really relevant unless you're running this on a single-core VM. On a normal system with a multi-core CPU, two threads can be running simultaneously on separate cores. This is why we have C++ std::atomic<> and C _Atomic which are useful for single variables like int .

It depends on your memcpy implementation. Any non-terrible one won't do any single-byte copies, and all the 16-bit loads/stores will actually be part of larger loads/stores (or possibly the internals for rep movsb microcode). It's hard to imagine how a sensible compiler (not a DeathStation 9000) would ever choose to inline a copy that could introduce tearing across a uint16_t boundary.

But if you don't do it manually (eg with AVX intrinsics), it is barely possible some weird optimization could get a compiler to do a byte load/store.

For a SIMD implementation like a normal library will use for small sizes, it comes down to Per-element atomicity of vector load/store and gather/scatter? - annoyingly there's no formal guarantee from either major x86 vendor (AMD or Intel). It's almost certain that it's safe, though, especially if the entire vector is itself aligned (so no cache-line splits or page splits). Using alignas(64) uint16_t global_buffer[128]; would be a good way to ensure that.

If your total copy size wasn't a multiple of the vector width, overlapping copies still won't introduce tearing within one uint16_t . Like the first 8 uint16_t and the last 8 uint16_t, for copy sizes from 8 (full overlap) to 16 (no overlap) array elements.

And BTW, that's basically what glibc memcpy does for small copies. A 4 to 7-byte memcpy is done with two 4-byte loads and 4-byte stores, 32.. 63 bytes is done with 2x 32-byte vectors. (2 fully-overlapping avoids store-forwarding stalls when reading later, vs. two non-overlapping halves. The upper end might actually let it go up to 64 bytes with a pair of full-size AVX vectors.)

Can memcpy of array of 16-bit objects be interrupted in between

Question

2 answers

solution1
1 ACCPTED 2021-05-22 07:17:26

solution2
1 2021-06-29 19:51:57

Can memcpy of array of 16-bit objects be interrupted in between

Question

2 answers

solution1 1 ACCPTED 2021-05-22 07:17:26

solution2 1 2021-06-29 19:51:57

solution1
1 ACCPTED 2021-05-22 07:17:26

solution2
1 2021-06-29 19:51:57