Block Level Atomic Write

Question

is it possible to do an atomic write on the block level?
as an example consider the following:

__global__ kernel (int atomic)
{
    atomic+=blockid.x; //should be atomic for each block
}

Answer 1

You can do some atomic operations in CUDA. See Apendix B.11 Atomic Functions in the CUDA Programming Guide. ie:

__global__ void kernel (int *result)
{
    atomicAdd(result, blockIdx.x); // 
}

You can also exchange the value of a variable

__global__ void kernel (int *result)
{
    atomicExch(result, blockIdx.x); // 
}

Both examples operates in Global Memory.

Atomic functions operating on shared memory and atomic functions operating on 64-bit words are only available for devices of compute capability 1.2 and above.

Regards.

Answer 2

You can perform atomic operations on shared memory, but not the way you tried to do so in your code snippet: Your kernel's int parameter is a thread-specific variable; even though all threads get the same value you gave at launch, they don't store it in shared memory - and it's meaningless to operate on it atomically.

If you had passed, say, an int * to some buffer - that would be a buffer in global memory. You can perform device-wide-atomic operations on data in global memory, as described in @pQB's answer . But you asked about block-level atomic operations... that does not mean much for global data. Still, if one of your threads writes to some global address, it can all __threadfence_block() to stall until the effect of this write is visible to all other threads in the block.

Properly block-level atomics are also supported in CUDA, but - on shared memory. Read about how to use shared memory in this Parallel4All blog entry or in the relevant section the CUDA Programming Guide .

If you have some __shared__ int x , you can indeed perform a block-level atomic operation on it, with the same syntax as for global atomics: atomicAdd(&x, 7) will atomically add 123 to the value of x. But - remember all threads in the block will do the same, and you certainly don't want to try up to 1024 atomic writes at a time. Typically you would have something like

__shared__ some_buffer[NumFoosPerBar];

// ...

if (check_condition()) { 
     int foo_index = get_thread_foo_index_for(threadIdx.x);
     atomicAdd(&some_buffer[foo_index], 7);
}

where possibly more than one thread writes to the same location, but not necessarily. When you do expect multiple writes - don't use atomics, but rather perform some kind of reduction on the values to be written.

Answer 3

While unclear what you mean with block/block level, it sounds like you just need an atomic add. Those are found in the kernel in #include <asm/atomic.h> your code would be

__global__ kernel (int atomic)
{
    atomic_add(blockid.x,&atomic);
}

atomic would have to be of type atomic_t and blockid.x an int.

Block Level Atomic Write

Question

3 answers

solution1
3 2011-07-07 17:43:49

solution2
0 2016-10-16 18:02:49

solution3
-2 ACCPTED 2011-07-07 16:49:04

Block Level Atomic Write

Question

3 answers

solution1 3 2011-07-07 17:43:49

solution2 0 2016-10-16 18:02:49

solution3 -2 ACCPTED 2011-07-07 16:49:04

solution1
3 2011-07-07 17:43:49

solution2
0 2016-10-16 18:02:49

solution3
-2 ACCPTED 2011-07-07 16:49:04