简体   繁体   English

单内核启动中块执行的CUDA顺序

[英]CUDA order of block execution in single kernel launch

I'm launching 256 threads in total. 我总共启动了256个线程。 When I do it by launching a single block, everything works fine. 当我通过启动单个块来执行此操作时,一切正常。 But when I launch the threads in 2x2 blocks each with (8x8 threads), the kernel loops infinitely. 但是,当我以2x2个块(每个8x8个线程)启动线程时,内核会无限循环。 Well, the real problem is that my kernel code waits for partial results from other blocks and after running several tests, I observed that the blocks were launched in a random order and they seem to be executed in a sequential order. 好吧,真正的问题是我的内核代码等待其他块的部分结果,并且在运行了几次测试后,我观察到这些块是以随机顺序启动的,并且它们似乎是按顺序执行的。

Do CUDA blocks run in parallel if they're launched from the same kernel? 如果从同一内核启动CUDA块,它们是否可以并行运行? The GPU I'm using is not a limitation since I'm launching only 256 threads and a GTX 580 can handle them. 我使用的GPU没有限制,因为我只启动256个线程,而GTX 580可以处理它们。 (everything works fine in a single block launch of 16x16 threads) Is there a way I can know the order of execution or maybe specify it? (在16x16线程的单块启动中,一切都正常工作)有没有办法我可以知道执行顺序或指定执行顺序?

Yes, blocks run in parallel. 是的,块并行运行。 How many blocks are run in parallel is determined by performance of your GPU, but important thing is that launching order of blocks is undefined and indefinable . 并行运行多少个块取决于GPU的性能,但重要的是,块的启动顺序是不确定的不确定的 Read more here - chapter 2.2, last three paragraphs. 在此处阅读更多内容-第2.2章,最后三段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM