简体   繁体   English

C++ memory 的连续块内的原子操作

[英]C++ Atomic operations within contiguous block of memory

Is it possible to use atomic operations, possibly using the std::atomic library, when assigning values in a contiguous block of memory.在 memory 的连续块中分配值时,是否可以使用原子操作,可能使用std::atomic库。

If I have this code:如果我有这个代码:

uint16_t* data = (uint16_t*) calloc(num_values, size);

What can I do to make operations like this atomic:我能做些什么来进行这样的原子操作:

data[i] = 5;

I will have multiple threads assigning to data , possibly at the same index, at the same time.我将有多个线程同时分配给data ,可能在同一个索引处。 The order in which these threads modify the value at a particular index doesn't matter to me, as long as the modifications are atomic, avoiding any possible mangling of the data.这些线程在特定索引处修改值的顺序对我来说并不重要,只要修改是原子的,避免任何可能的数据损坏。

EDIT: So, per @user4581301, I'm providing some context for my issue here.编辑:所以,根据@user4581301,我在这里为我的问题提供了一些背景信息。 I am writing a program to align depth video data frames to color video data frames.我正在编写一个程序来将深度视频数据帧与彩色视频数据帧对齐。 The camera sensors for depth and color have different focal characteristics so they do not come completely aligned.用于深度和颜色的相机传感器具有不同的焦点特性,因此它们不会完全对齐。 The general algorithm involves projecting a pixel in depth space to a region in color space, then, overwriting all values in the depth frame, spanning that region, with that single pixel.一般算法涉及将深度空间中的像素投影到颜色空间中的区域,然后用该单个像素覆盖深度帧中跨越该区域的所有值。 I am parallelizing this algorithm.我正在并行化这个算法。 These projected regions may overlap, thus when paralellized, writes to an index may occur concurrently.这些投影区域可能会重叠,因此当并行化时,对索引的写入可能会同时发生。

Pseudo-code looks like this:伪代码如下所示:

for x in depth_video_width:
  for y in depth_video_height:
      pixel = get_pixel(x, y)
      x_min, x_max, y_min, y_max = project_depth_pixel(x, y)

      // iterate over projected region
      for x` in [x_min, x_max]:
         for y` in [y_min, y_max]:
             // possible concurrent modification here
             data[x`, y`] = pixel

The outer loop or outermost two loops are parallelized.外部循环或最外面的两个循环是并行的。

You're not going to be able to do exactly what you want like this.您将无法像这样完全按照您的意愿行事。

An atomic array doesn't make much sense , nor is it what you want (you want individual writes to be atomic).原子数组没有多大意义,也不是您想要的(您希望单个写入是原子的)。

You can have an array of atomics:你可以有一个原子数组:

#include <atomic>
#include <array>

int main()
{
    std::array<std::atomic<uint16_t>, 5> data{};
    data[1] = 5;
}

… but now you can't just access a contiguous block of uint16_t s, which it's implied you want to do. …但现在你不能只访问一个连续的uint16_t块,这暗示你想要这样做。

If you don't mind something platform-specific, you can keep your array of uint16_t s and ensure that you only use atomic operations with each one (eg GCC's __atomic intrinsics ).如果您不介意特定于平台的东西,您可以保留您的uint16_t数组,并确保您只对每个操作使用原子操作(例如GCC 的__atomic内在函数)。

But, generally, I think you're going to want to keep it simple and just lock a mutex around accesses to a normal array.但是,一般来说,我认为您会希望保持简单,并在访问普通数组时锁定互斥锁。 Measure to be sure, but you may be surprised at how much of a performance loss you don't get.测量以确保,但您可能会惊讶于您没有得到多少性能损失。

If you're desperate for atomics, and desperate for an underlying array of uint16_t , and desperate for a standard solution, you could wait for C++20 and keep an std::atomic_ref (this is like a non-owning std::atomic ) for each element, then access the elements through those.如果您急需原子,急需uint16_t的底层数组,急需标准解决方案,您可以等待 C++20 并保留std::atomic_ref (这就像非拥有的std::atomic ) 为每个元素,然后通过这些访问元素。 But then you still have to be cautious about any operation accessing the elements directly, possibly by using a lock, or at least by being very careful about what's doing what and when.但是,您仍然必须对任何直接访问元素的操作保持谨慎,可能通过使用锁,或者至少要非常小心在做什么和何时做什么。 At this point your code is much more complex: be sure it's worthwhile.此时您的代码要复杂得多:确保它是值得的。

To add on the last answer, I would strongly advocate against using an array of atomics since any read or write to an atomic locks an entire cache line (at least on x86).为了补充最后一个答案,我强烈反对使用原子数组,因为对原子的任何读取或写入都会锁定整个缓存行(至少在 x86 上)。 In practice, it means that when accessing element i in your array (either to read or to write it), you would lock the cache line around that element (so other threads couldn't access that particular cache line).实际上,这意味着当访问数组中的元素 i 时(读取或写入它),您将锁定该元素周围的缓存行(因此其他线程无法访问该特定缓存行)。

The solution to your problem is a mutex as mentioned in the other answer.您的问题的解决方案是另一个答案中提到的互斥锁。

For the maximum supported atomic operations it seems to be currently 64bits (see https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.html )对于最大支持的原子操作,它目前似乎是 64 位(参见https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer- vol-3a-part-1-manual.html

The Intel-64 memory ordering model guarantees that, for each of the following 
memory-access instructions, the constituent memory operation appears to execute 
as a single memory access:

• Instructions that read or write a single byte.
• Instructions that read or write a word (2 bytes) whose address is aligned on a 2
byte boundary.
• Instructions that read or write a doubleword (4 bytes) whose address is aligned
on a 4 byte boundary.
• Instructions that read or write a quadword (8 bytes) whose address is aligned on
an 8 byte boundary.

Any locked instruction (either the XCHG instruction or another read-modify-write
 instruction with a LOCK prefix) appears to execute as an indivisible and 
uninterruptible sequence of load(s) followed by store(s) regardless of alignment.

In other word, your processor doesn't know how to do more than 64bits atomic operations.换句话说,您的处理器不知道如何执行超过 64 位的原子操作。 And I'm not even mentioning here the STL implementation of atomic which can use lock (see https://en.cppreference.com/w/cpp/atomic/atomic/is_lock_free ).我什至没有在这里提到可以使用锁的原子的 STL 实现(参见https://en.cppreference.com/w/cpp/atomic/atomic/is_lock_free )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM