简体   繁体   中英

Usage of << for exponentiation in C or CUDA

What is the meaning of the statement

// create arrays of 1M elements
const int num_elements = 1<<20;

in the code below? Is it specific to CUDA or is can this be used in Standard C as well?

When I printf 'ed num_elements I got num_elements==1048576

Which turns out to be 2^20. So is the << operator a shorthand for exponentiation in C?

// This example demonstrates parallel floating point vector
// addition with a simple __global__ function.

#include <stdlib.h>
#include <stdio.h>


// this kernel computes the vector sum c = a + b
// each thread performs one pair-wise addition
__global__ void vector_add(const float *a,
                           const float *b,
                           float *c,
                           const size_t n)
{
  // compute the global element index this thread should process
  unsigned int i = threadIdx.x + blockDim.x * blockIdx.x;

  // avoid accessing out of bounds elements
  if(i < n)
  {
    // sum elements
    c[i] = a[i] + b[i];
  }
}


int main(void)
{
  // create arrays of 1M elements
  const int num_elements = 1<<20;

  // compute the size of the arrays in bytes
  const int num_bytes = num_elements * sizeof(float);

  // points to host & device arrays
  float *device_array_a = 0;
  float *device_array_b = 0;
  float *device_array_c = 0;
  float *host_array_a   = 0;
  float *host_array_b   = 0;
  float *host_array_c   = 0;

  // malloc the host arrays
  host_array_a = (float*)malloc(num_bytes);
  host_array_b = (float*)malloc(num_bytes);
  host_array_c = (float*)malloc(num_bytes);

  // cudaMalloc the device arrays
  cudaMalloc((void**)&device_array_a, num_bytes);
  cudaMalloc((void**)&device_array_b, num_bytes);
  cudaMalloc((void**)&device_array_c, num_bytes);

  // if any memory allocation failed, report an error message
  if(host_array_a == 0 || host_array_b == 0 || host_array_c == 0 ||
     device_array_a == 0 || device_array_b == 0 || device_array_c == 0)
  {
    printf("couldn't allocate memory\n");
    return 1;
  }

  // initialize host_array_a & host_array_b
  for(int i = 0; i < num_elements; ++i)
  {
    // make array a a linear ramp
    host_array_a[i] = (float)i;

    // make array b random
    host_array_b[i] = (float)rand() / RAND_MAX;
  }

  // copy arrays a & b to the device memory space
  cudaMemcpy(device_array_a, host_array_a, num_bytes, cudaMemcpyHostToDevice);
  cudaMemcpy(device_array_b, host_array_b, num_bytes, cudaMemcpyHostToDevice);

  // compute c = a + b on the device
  const size_t block_size = 256;
  size_t grid_size = num_elements / block_size;

  // deal with a possible partial final block
  if(num_elements % block_size) ++grid_size;

  // launch the kernel
  vector_add<<<grid_size, block_size>>>(device_array_a, device_array_b, device_array_c, num_elements);

  // copy the result back to the host memory space
  cudaMemcpy(host_array_c, device_array_c, num_bytes, cudaMemcpyDeviceToHost);

  // print out the first 10 results
  for(int i = 0; i < 10; ++i)
  {
    printf("result %d: %1.1f + %7.1f = %7.1f\n", i, host_array_a[i], host_array_b[i], host_array_c[i]);
  }


    // deallocate memory
  free(host_array_a);
  free(host_array_b);
  free(host_array_c);

  cudaFree(device_array_a);
  cudaFree(device_array_b);
  cudaFree(device_array_c);
}

No, the << operator is the bit shift operator. It takes the bits of a number, such as 00101 and shifts them over to the left n places, which has the effect of multiplying a number by a power of two. So x << y is x * 2^y . This a result of the way numbers are stored internally in computers, which is binary.

For example, the number 1 is, when stored as a 32-bit integer in 2's complement (which it is):

00000000000000000000000000000001

When you do

1 << 20

You are taking all the 1 's in that binary representation and moving them over 20 places:

00000000000100000000000000000000

Which is 2^20. This also works for sign-magnitude representation, 1's complement, etc.

Another example, if you take the representation of 5 :

00000000000000000000000000000101

And do 5 << 1 , you get

00000000000000000000000000001010

Which is 10 , or 5 * 2^1 .

Conversely, the >> will divide by a power of 2 by moving the bits over to the right n places.

It's a bit shift. In binary, take a 1, move it 20 places to the left is equivalent to multiplying it by 2^20

edit: Yes it's standard C and a very good way of making it clear to the user that it's a single 1 in the 20bit position, more so than writing int a = 1048576;

The (standard) C left shift operator << works by moving the bits (binary digits) of the value on its left side to the left by as many “spaces” as indicated by the value on its right side (filling in zeros on the right), ie 1 << 20 results in the binary number with 1 followed by 20 zeros. Since binary is base 2, each shift to the left doubles the value (multiplies by the base), ie it's the same as multiplying by powers of 2.

This property of binary numbers can be exploited to multiply and divide positive integers by powers of 2 faster than with more general math functions. (Similarly in elementary school math one can exploit the similar property of decimal numbers when working with powers of 10… =)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM