有关CUDA内存的问题

Question

I am quite new to CUDA programming and there are some stuff about the memory model that are quite unclear to me. 我对CUDA编程还很陌生，我对一些关于内存模型的东西还不清楚。 Like, how does it work? 就像，它如何运作？ For example if I have a simple kernel 例如，如果我有一个简单的内核

__global__ void kernel(const int* a, int* b){ 
    some computation where different threads in different blocks might      
    write at the same index of b
}

So I imagine a will be in the so-called constant memory. 因此，我想a将在所谓的恒定内存中。 But what about b ? 但是b呢？ Since different threads in different blocks will write in it, how will it work? 由于不同块中的不同线程将写入其中，它将如何工作？ I read somewhere that it was guaranteed that in the case of concurrent writes in global memory by different threads in the same block at least one would be written, but there's no guarantee about the others. 我在某处读到，可以保证在同一块中的不同线程在全局内存中进行并发写入的情况下，至少可以写入一个，但是不能保证其他写入。 Do I need to worry about that, ie for example have every thread in a block write in shared memory and once they are all done, have one write it all to the global memory? 我是否需要担心这一点，例如，将块中的每个线程都写入共享内存中，一旦全部完成，就全部写入全局内存中了吗？ Or is CUDA taking care of it for me? 还是CUDA替我照顾了它？

Answer 1

So I imagine a will be in the so-called constant memory. 因此，我想a将在所谓的恒定内存中。

Yes, a the pointer will be in constant memory, but not because it is marked const (this is completely orthogonal). 是的， a 指针将是恒定的存储器，但不是因为它被标记const （这是完全正交的）。 b the pointer is also in constant memory. b 指针也位于常量存储器中。 All kernel arguments are passed in constant memory (except in CC 1.x). 所有内核参数都在恒定内存中传递（CC 1.x中除外）。 The memory pointed-to by a and b could, in theory, be anything (device global memory, host pinned memory, anything addressable by UVA, I believe). 从理论上讲， a和b指向的内存可以是任何东西（设备全局内存，主机固定的内存，UVA可寻址的任何东西）。 Where it resides is chosen by you, the user. 用户选择它所在的位置。

I read somewhere that it was guaranteed that in the case of concurrent writes in global memory by different threads in the same block at least one would be written, but there's no guarantee about the others. 我在某处读到，可以保证在同一块中的不同线程在全局内存中进行并发写入的情况下，至少可以写入一个，但是不能保证其他写入。

Assuming your code looks like this: 假设您的代码如下所示：

b[0] = 10; // Executed by all threads

Then yes, that's a (benign) race condition, because all threads write the same value to the same location. 然后是的，这是（良性）竞争条件，因为所有线程都将相同的值写入相同的位置。 The result of the write is defined, however the number of writes is unspecified and so is the thread that does the "final" write. 定义了写入的结果，但是未指定写入的次数，执行“最终”写入的线程也未指定。 The only guarantee is that at least one write happens. 唯一的保证是至少发生一次写操作。 In practice, I believe one write per warp is issued, which is a waste of bandwidth if your blocks contain more than one warp (which they should). 在实践中，我相信每个扭曲都会发出一次写操作，如果您的块包含多个扭曲（它们应该这样做），那将浪费带宽。

On the other hand, if your code looks like this: 另一方面，如果您的代码如下所示：

b[0] = threadIdx.x;

This is plain undefined behavior. 这是普通的未定义行为。

Do I need to worry about that, ie for example have every thread in a block write in shared memory and once they are all done, have one write it all to the global memory? 我是否需要担心这一点，例如，将块中的每个线程都写入共享内存中，一旦全部完成，就全部写入全局内存中了吗？

Yes, that's how it's usually done. 是的，通常就是这样。

有关CUDA内存的问题

问题描述

1 个解决方案

解决方案1
2 2015-12-09 11:23:20

有关CUDA内存的问题

问题描述

1 个解决方案

解决方案1 2 2015-12-09 11:23:20

解决方案1
2 2015-12-09 11:23:20