简体   繁体   中英

CUDA customized atomicCAS for floating point types (like double)

atomicCAS allows using integral types of various lengths (according to specs word sizes of 16/32/64 bit). It works fine for integral types like int , unsigned long long ,...

I want to use atomic operations for non integral types of same length. My naive thought is to simply type-cast the data to an integral type of same length like double to uint64_t ( unsigned long long ). To avoid implicit conversion I do it via a pointer.

Example

double cmp = 0.0;
double val = 4.6692016091;
double dst = 0.0;
uint64_t old = atomicCAS((uint64_t*)&dst, *((uint64_t*)&cmp), *((uint64_t*)&val));

Problem

Threads quit as soon as the atomicCAS command is executed. I couldn't find any details why that happens.

Is there a way to use atomicCAS that way in CUDA context?

In case it's relevant: I use CUDA 11.7, --machine 64 nvcc switch and compute_61,sm_61 (Pascal architecture).

As @Homer512 pointed out, atomicCAS is implemented for global and shared memory, as it makes no sense in non concurrent scenarios (like thread local variables used in the example above) to use atomic operations (at least I can't think of any).

Following vectorized example works instead.

const unsigned int idx = (blockIdx.x * blockDim.x) + threadIdx.x;
const unsigned int nThreads = 64;
if (idx >= nThreads) return;

__shared__ double s[nThreads];
const double cmp = s[idx] = 0.0;

double val = 4.6692016091;
uint64_t old = atomicCAS((uint64_t*)&s[idx], *((uint64_t*)&cmp), *((uint64_t*)&val));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM