atomicCAS
allows using integral types of various lengths (according to specs word sizes of 16/32/64 bit). It works fine for integral types like int
, unsigned long long
,...
I want to use atomic operations for non integral types of same length. My naive thought is to simply type-cast the data to an integral type of same length like double
to uint64_t
( unsigned long long
). To avoid implicit conversion I do it via a pointer.
Example
double cmp = 0.0;
double val = 4.6692016091;
double dst = 0.0;
uint64_t old = atomicCAS((uint64_t*)&dst, *((uint64_t*)&cmp), *((uint64_t*)&val));
Problem
Threads quit as soon as the atomicCAS
command is executed. I couldn't find any details why that happens.
Is there a way to use atomicCAS
that way in CUDA context?
In case it's relevant: I use CUDA 11.7, --machine 64
nvcc switch and compute_61,sm_61
(Pascal architecture).
As @Homer512 pointed out, atomicCAS
is implemented for global
and shared
memory, as it makes no sense in non concurrent scenarios (like thread local variables used in the example above) to use atomic operations (at least I can't think of any).
Following vectorized example works instead.
const unsigned int idx = (blockIdx.x * blockDim.x) + threadIdx.x;
const unsigned int nThreads = 64;
if (idx >= nThreads) return;
__shared__ double s[nThreads];
const double cmp = s[idx] = 0.0;
double val = 4.6692016091;
uint64_t old = atomicCAS((uint64_t*)&s[idx], *((uint64_t*)&cmp), *((uint64_t*)&val));
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.