简体   繁体   中英

some mathematical operations in CUDA

I have a 2D matrix containing 0,1 and 2. I am writing a cuda kernel where the number of threads is equal to the matrix size and each thread would operate on each element of the matrix. Now, I needed mathematical operations that could keep 0 and 1 as it is, but would convert 2 to 1. That is a mathematical operation, without any if-else, which would do the following conversion : 0 ->0; 1 ->1; 2 ->1. Is there any possible way using mathematical operators which would do the above mentioned conversion. Any help would be extremely appreciated. Thank you.

This is not a cuda question.

int A;
// set A to 0, 1, or 2
int a = (A + (A>>1)) & 1;
// a is now 0 if A is 0, or 1 if A is 1 or 2

or as a macro:

#define fix01(x) ((x+(x>>1))&1)

int a = fix01(A);

This also seems to work:

#define fix01(x) ((x&&1)&1)

I don't know if the use of the boolean AND operator ( && ) fits your definition of "mathematical operations".

As the question was about "mathematical" functions I suggest the following 2nd order polynomial:

int f(int x) { return ((3-x)*x)/2; }

But if you want avoid branching in order to maximize speed: There is a min instruction since PTX ISA 1.0. (See Tab. 36 in the PTX ISA 3.1 manual.) So the following CUDA code

__global__ void test(int *x, int *y)
{
    *y = *x <= 1 ? *x : 1;
}

compiles to the following PTX assembler in my test (just called nvcc from CUDA 5 without any arch options)

    code for sm_10
            Function : _Z4testPiS_
    /*0000*/     /*0x1000c8010423c780*/     MOV R0, g [0x4];
    /*0008*/     /*0xd00e000580c00780*/     GLD.U32 R1, global14 [R0];
    /*0010*/     /*0x1000cc010423c780*/     MOV R0, g [0x6];
    /*0018*/     /*0x30800205ac400780*/     IMIN.S32 R1, R1, c [0x1] [0x0];
    /*0020*/     /*0xd00e0005a0c00781*/     GST.U32 global14 [R0], R1;

So a min() implementation using a conditional ?: actually compiles to a single IMIN.S32 PTX instruction without any branching. So I'd recommend this for any real-world applications:

int f(int x) { return x <= 1 ? x : 1; }

But back to the question of using only non-branching operations:

Another form of getting this result in C is by using two not operators:

int f(int x) { return !!x; }

Or simply compare with zero:

int f(int x) { return x != 0; }

(The results of ! and != are guaranteed to be 0 or 1, compare Sec. 6.5.3.3 Par. 5 and Sec. 6.5.9 Par. 3 of the C99 standard, ISO/IEC 9899:1999. Afair this guarantee also holds in CUDA.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM