CUDA中的一些数学运算

Question

I have a 2D matrix containing 0,1 and 2. I am writing a cuda kernel where the number of threads is equal to the matrix size and each thread would operate on each element of the matrix. 我有一个包含0,1和2的2D矩阵。我正在编写一个cuda内核，其中线程数等于矩阵大小，每个线程将对矩阵的每个元素进行操作。 Now, I needed mathematical operations that could keep 0 and 1 as it is, but would convert 2 to 1. That is a mathematical operation, without any if-else, which would do the following conversion : 0 ->0; 现在，我需要可以保持0和1的数学运算，但是将2转换为1.这是一个数学运算，没有任何if-else，它将执行以下转换：0 - > 0; 1 ->1; 1 - > 1; 2 ->1. 2 - > 1。 Is there any possible way using mathematical operators which would do the above mentioned conversion. 是否有任何可能的方法使用数学运算符进行上述转换。 Any help would be extremely appreciated. 任何帮助将非常感激。 Thank you. 谢谢。

Answer 1

This is not a cuda question. 这不是一个问题。

int A;
// set A to 0, 1, or 2
int a = (A + (A>>1)) & 1;
// a is now 0 if A is 0, or 1 if A is 1 or 2

or as a macro: 或作为宏：

#define fix01(x) ((x+(x>>1))&1)

int a = fix01(A);

This also seems to work: 这似乎也有效：

#define fix01(x) ((x&&1)&1)

I don't know if the use of the boolean AND operator ( && ) fits your definition of "mathematical operations". 我不知道布尔AND运算符（ && ）的使用是否符合您对“数学运算”的定义。

Answer 2

As the question was about "mathematical" functions I suggest the following 2nd order polynomial: 由于问题是关于“数学”函数，我建议使用以下二阶多项式：

int f(int x) { return ((3-x)*x)/2; }

But if you want avoid branching in order to maximize speed: There is a min instruction since PTX ISA 1.0. 但是如果你想避免分支以最大化速度：自PTX ISA 1.0以来有一条min指令。 (See Tab. 36 in the PTX ISA 3.1 manual.) So the following CUDA code （参见PTX ISA 3.1手册中的表36）。所以下面的CUDA代码

__global__ void test(int *x, int *y)
{
    *y = *x <= 1 ? *x : 1;
}

compiles to the following PTX assembler in my test (just called nvcc from CUDA 5 without any arch options) 在我的测试中编译到下面的PTX汇编程序（刚从CUDA 5调用nvcc而没有任何arch选项）

    code for sm_10
            Function : _Z4testPiS_
    /*0000*/     /*0x1000c8010423c780*/     MOV R0, g [0x4];
    /*0008*/     /*0xd00e000580c00780*/     GLD.U32 R1, global14 [R0];
    /*0010*/     /*0x1000cc010423c780*/     MOV R0, g [0x6];
    /*0018*/     /*0x30800205ac400780*/     IMIN.S32 R1, R1, c [0x1] [0x0];
    /*0020*/     /*0xd00e0005a0c00781*/     GST.U32 global14 [R0], R1;

So a min() implementation using a conditional ?: actually compiles to a single IMIN.S32 PTX instruction without any branching. 所以使用条件？的min（）实现：实际上编译为单个IMIN.S32 PTX指令而没有任何分支。 So I'd recommend this for any real-world applications: 因此，我建议将其用于任何实际应用：

int f(int x) { return x <= 1 ? x : 1; }

But back to the question of using only non-branching operations: 但回到仅使用非分支操作的问题：

Another form of getting this result in C is by using two not operators: 在C中获得此结果的另一种形式是使用两个非运算符：

int f(int x) { return !!x; }

Or simply compare with zero: 或者简单地与零比较：

int f(int x) { return x != 0; }

(The results of ! and != are guaranteed to be 0 or 1, compare Sec. 6.5.3.3 Par. 5 and Sec. 6.5.9 Par. 3 of the C99 standard, ISO/IEC 9899:1999. Afair this guarantee also holds in CUDA.) （！和！=的结果保证为0或1，比较C.5标准的第6.5.3.3节第5节和第6.5.9节第3节，ISO / IEC 9899：1999。此保证也是在CUDA举行。）

CUDA中的一些数学运算

问题描述

2 个解决方案

解决方案1
3 已采纳 2013-07-14 22:36:29

解决方案2
1 2013-07-16 15:02:09

CUDA中的一些数学运算

问题描述

2 个解决方案

解决方案1 3 已采纳 2013-07-14 22:36:29

解决方案2 1 2013-07-16 15:02:09

解决方案1
3 已采纳 2013-07-14 22:36:29

解决方案2
1 2013-07-16 15:02:09