简体繁体中英

Maximum number of CUDA blocks?

原文 2019-04-20 02:30:28 2 1 cuda

I want to implement an algorithm in CUDA that takes an input of size N and uses N^2 threads to execute it (this is the way the particular algorithm words). I've been asked to make a program that can handle up to N = 2^10. I think for my system a given thread block can have up to 512 threads, but for N = 2^10, having N^2 threads would mean having N^2 / 512 = 2^20 / 512 blocks. I read at this link ( http://www.ce.jhu.edu/dalrymple/classes/602/Class10.pdf ) that you the number of blocks "can be as large as 65,535 (or larger 2^31 - 1)".

My questions are:

1) How do I find the actual maximum number of blocks? I'm not sure what the quote ^^ meant when it said "65,535 (or larger 2^31 - 1)", because those are obviously very different numbers.

2) Is it possible to run an algorithm that requires 2^20 / 512 threads?

3) If the number of threads that I need (2^20 / 512) is greater than what CUDA can provide, what happens? Does it just fill all the available threads, and then re-assign those threads to the additional waiting tasks once they're done computing?

4) If I want to use the maximum number of threads in each block, should I just set the number of threads to 512 like <<<number, 512>>> , or is there an advantage to using a dim3 value?

If you can provide any insight into any of these ^^ questions, I'd appreciate it.

1 answers

How do I find the actual maximum number of blocks? I'm not sure what the quote ^^ meant when it said "65,535 (or larger 2^31 - 1)", because those are obviously very different numbers.

Read the relevant documentation , or build and run the devicequery utility. But in either case, the limit is much larger than 2048 (which is what 2^20 / 512 equals). Note also that the block size limit on all currently supported hardware is 1024 threads per block, not 512, so you might need as few as 1024 blocks.

Is it possible to run an algorithm that requires 2^20 / 512 threads[sic]?

Yes

If the number of threads[sic] that I need .... is greater than what CUDA can provide, what happens?

Nothing. A runtime error is emitted.

Does it just fill all the available threads, and then re-assign those threads to the additional waiting tasks once they're done computing?

No. You would have to explicitly implement such a scheme yourself.

If I want to use the maximum number of threads in each block, should I just set the number of threads to 512 like <<<number, 512>>> , or is there an advantage to using a dim3 value?

There is no difference.

GPU/CUDA: Maximum number of blocks of a grid and Maximum number of resident blocks per multiprocessor

Maximum blocks per grid:CUDA

Is there a maximum number of streams in CUDA?

CUDA: Maximum number of blocks in a grid != CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X?

Maximum blocks number on a GTX TITAN

cuda max number in Blocks and allocation manage

CUDA optimise number of blocks for grid stride loop

measure cuda execution time with respect to number of blocks

why increasing the number of blocks in cuda increase the time?

Most efficient number of blocks to launch in CUDA?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question GPU/CUDA: Maximum number of blocks of a grid and Maximum number of resident blocks per multiprocessor Maximum blocks per grid:CUDA Is there a maximum number of streams in CUDA? CUDA: Maximum number of blocks in a grid != CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X? Maximum blocks number on a GTX TITAN cuda max number in Blocks and allocation manage CUDA optimise number of blocks for grid stride loop measure cuda execution time with respect to number of blocks why increasing the number of blocks in cuda increase the time? Most efficient number of blocks to launch in CUDA?

Related Tags

Maximum number of CUDA blocks?

Question

1 answers

solution1 1

solution1
1