简体   繁体   English

Cuda你好世界的例子

[英]Cuda hello world example

I'm trying to understand a simple addition within the hello world CUDA example. 我试图在hello world CUDA示例中理解一个简单的添加。 I have two arrays: 我有两个数组:

char a[N] = "Hello \0\0\0\0\0\0";
int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

The grid and block dimension are 1 and 16. I dont really get how when you perform this: 网格和块的尺寸是1和16.我执行此操作时真的不知道如何:

a[threadIdx.x] += b[threadIdx.x];

you get "Hello World!". 你得到“Hello World!”。 This is a classical introductory example in CUDA and the logic behind parallelism it's easy to understand, but this sum...I don't really get it. 这是CUDA中的经典入门示例和并行性背后的逻辑,它很容易理解,但这个总结......我真的不明白。 For full source code 完整的源代码

#include <stdio.h>

const int N = 16; 
const int blocksize = 16; 

__global__ 
void hello(char *a, int *b) 
{
    a[threadIdx.x] += b[threadIdx.x];
}

int main()
{
    char a[N] = "Hello \0\0\0\0\0\0";
    int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

    char *ad;
    int *bd;
    const int csize = N*sizeof(char);
    const int isize = N*sizeof(int);

    printf("%s", a);

    cudaMalloc( (void**)&ad, csize ); 
    cudaMalloc( (void**)&bd, isize ); 
    cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice ); 
    cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice ); 

    dim3 dimBlock( blocksize, 1 );
    dim3 dimGrid( 1, 1 );
    hello<<<dimGrid, dimBlock>>>(ad, bd);
    cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost ); 
    cudaFree( ad );
    cudaFree( bd );

    printf("%s\n", a);
    sleep(1);
    return EXIT_SUCCESS;
}

Look at the example code once more: 再看一下示例代码:

printf("%s", a);

This prints "Hello " , the value you've assigned to a in the lines you've pasted. 这将打印“你好”,你已经分配给value a在您贴上了线。 Then, the code iterates both arrays and increments each a value ( char is an arithmetic type) using the b values. 然后,代码迭代两个阵列和增量每a的值( char是一个算术型)使用b值。 So what you get is: 所以你得到的是:

'H' + 15 = W
'e' + 10 = o
'l' + 6  = r
'l' + 0  = l
'0' - 11 = d
' ' + 1  = !

After all the cuda stuff, there's this simple statement: 在所有的cuda之后,有这个简单的陈述:

printf("%s\n", a);

Which prints the now altered value of a to the stdout: "World!" 它将现在改变的a值打印到标准输出: “世界!” . Put the two together and you get "Hello World!" 将两者放在一起,就会得到“Hello World!”

As for the use of threadIdx , that's the variable that holds the thread ID's, more details here . 至于threadIdx的使用,这是保存线程ID的变量, 这里有更多细节

The function containing that increment statement you're unsure about is called like this: 包含你不确定的增量语句的函数被调用如下:

hello<<<dimGrid, dimBlock>>>(ad, bd);

With dim3 dimGrid( 1, 1 ); 使用dim3 dimGrid( 1, 1 ); and dim3 dimBlock( blocksize, 1 ); dim3 dimBlock( blocksize, 1 ); . blocksize is const int blocksize = 16 (the same value as N . blocksizeconst int blocksize = 16 (与N相同的值)。

In short then, you're basically calling hello 16 times, on 16 different thread, and each thread uses its unique id as offset. 简而言之,你基本上在16个不同的线程上调用hello 16次,每个线程使用其唯一的id作为offset。 Because the blocksize is equal to the array size, you'll never end up accessing out of bounds memory. 因为blocksize等于数组大小,所以你永远不会最终访问越界内存。

This is, of course, assuming you're using ASCII, in which case you can play around with the values of b to form whatever word you like: 当然,这是假设您正在使用ASCII,在这种情况下,您可以使用b的值来形成您喜欢的任何单词:

Char  Dec  Oct  Hex | Char  Dec  Oct  Hex | Char  Dec  Oct  Hex | Char Dec  Oct   Hex
-------------------------------------------------------------------------------------
(nul)   0 0000 0x00 | (sp)   32 0040 0x20 | @      64 0100 0x40 | `      96 0140 0x60
(soh)   1 0001 0x01 | !      33 0041 0x21 | A      65 0101 0x41 | a      97 0141 0x61
(stx)   2 0002 0x02 | "      34 0042 0x22 | B      66 0102 0x42 | b      98 0142 0x62
(etx)   3 0003 0x03 | #      35 0043 0x23 | C      67 0103 0x43 | c      99 0143 0x63
(eot)   4 0004 0x04 | $      36 0044 0x24 | D      68 0104 0x44 | d     100 0144 0x64
(enq)   5 0005 0x05 | %      37 0045 0x25 | E      69 0105 0x45 | e     101 0145 0x65
(ack)   6 0006 0x06 | &      38 0046 0x26 | F      70 0106 0x46 | f     102 0146 0x66
(bel)   7 0007 0x07 | '      39 0047 0x27 | G      71 0107 0x47 | g     103 0147 0x67
(bs)    8 0010 0x08 | (      40 0050 0x28 | H      72 0110 0x48 | h     104 0150 0x68
(ht)    9 0011 0x09 | )      41 0051 0x29 | I      73 0111 0x49 | i     105 0151 0x69
(nl)   10 0012 0x0a | *      42 0052 0x2a | J      74 0112 0x4a | j     106 0152 0x6a
(vt)   11 0013 0x0b | +      43 0053 0x2b | K      75 0113 0x4b | k     107 0153 0x6b
(np)   12 0014 0x0c | ,      44 0054 0x2c | L      76 0114 0x4c | l     108 0154 0x6c
(cr)   13 0015 0x0d | -      45 0055 0x2d | M      77 0115 0x4d | m     109 0155 0x6d
(so)   14 0016 0x0e | .      46 0056 0x2e | N      78 0116 0x4e | n     110 0156 0x6e
(si)   15 0017 0x0f | /      47 0057 0x2f | O      79 0117 0x4f | o     111 0157 0x6f
(dle)  16 0020 0x10 | 0      48 0060 0x30 | P      80 0120 0x50 | p     112 0160 0x70
(dc1)  17 0021 0x11 | 1      49 0061 0x31 | Q      81 0121 0x51 | q     113 0161 0x71
(dc2)  18 0022 0x12 | 2      50 0062 0x32 | R      82 0122 0x52 | r     114 0162 0x72
(dc3)  19 0023 0x13 | 3      51 0063 0x33 | S      83 0123 0x53 | s     115 0163 0x73
(dc4)  20 0024 0x14 | 4      52 0064 0x34 | T      84 0124 0x54 | t     116 0164 0x74
(nak)  21 0025 0x15 | 5      53 0065 0x35 | U      85 0125 0x55 | u     117 0165 0x75
(syn)  22 0026 0x16 | 6      54 0066 0x36 | V      86 0126 0x56 | v     118 0166 0x76
(etb)  23 0027 0x17 | 7      55 0067 0x37 | W      87 0127 0x57 | w     119 0167 0x77
(can)  24 0030 0x18 | 8      56 0070 0x38 | X      88 0130 0x58 | x     120 0170 0x78
(em)   25 0031 0x19 | 9      57 0071 0x39 | Y      89 0131 0x59 | y     121 0171 0x79
(sub)  26 0032 0x1a | :      58 0072 0x3a | Z      90 0132 0x5a | z     122 0172 0x7a
(esc)  27 0033 0x1b | ;      59 0073 0x3b | [      91 0133 0x5b | {     123 0173 0x7b
(fs)   28 0034 0x1c |       62 0076 0x3e | ^      94 0136 0x5e | ~     126 0176 0x7e
(us)   31 0037 0x1f | ?      63 0077 0x3f | _      95 0137 0x5f | (del) 127 0177 0x7f

If you're using EBCDIC character encoding, then lookup the equivalent table for whatever variant you're dealing with and take it from there. 如果您正在使用EBCDIC字符编码,那么查找等效表以查找您正在处理的任何变体并从中获取它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM