简体   繁体   English

并发内核执行和OpenCL设备分区

[英]Concurrent kernel execution and OpenCL device partition

Recently I needed to do some experiments which need run multiple different kernel on AMD hardware. 最近,我需要做一些实验,这些实验需要在AMD硬件上运行多个不同的内核。 But I have several questions before starting to coding hence I really need your help. 但是在开始编码之前,我有几个问题,因此我真的需要您的帮助。

First, I am not quite sure whether AMD HW can support concurrent kernel execution on one device. 首先,我不确定AMD HW是否可以在一台设备上支持并发内核执行。 Because when I refer to the OpenCL specs, they said the command queue can be created as in-order and out-of-order. 因为当我参考OpenCL规范时,他们说命令队列可以按顺序和无序创建。 But I don't "out-of-order" mean "concurrent execution". 但是我不是“乱序”的意思是“并发执行”。 Is there anyone know info about this? 有没有人知道这方面的信息? My hardware is AMD APU A8 3870k. 我的硬件是AMD APU A8 3870k。 If this processor does not support, any other AMD products support? 如果该处理器不支持,是否还有其他AMD产品支持?

Second, I know there is an extension "device fission" which can be used to partition one device into two devices. 其次,我知道有一个扩展“设备裂变”,可用于将一个设备划分为两个设备。 This works only on CPU now. 现在这仅适用于CPU。 But in OpenCL specs, I saw something, ie "clcreatesubdevice", which is also used to partition one device into two? 但是在OpenCL规范中,我看到了一些东西,即“ clcreatesubdevice”,它也可以用于将一个设备分为两部分? So my question is is there any difference between these two techniques? 所以我的问题是这两种技术之间有什么区别吗? My understanding is: device fission can only be used on CPU, clcreatesubdevice can be used on both the CPU and the GPU. 我的理解是:设备裂变只能在CPU上使用,clcreatesubdevice可以在CPU和GPU上使用。 Is that correct? 那是对的吗?

Thanks for any kind reply! 感谢您的任何答复!

Real concurrent kernels is not a needed feature and causes so much troubles to driver developers. 真正的并发内核不是必需的功能,它会给驱动程序开发人员带来很多麻烦。 As far as I know, AMD does not support this feature without the subdevice split. 据我所知,如果不拆分子设备,AMD将不支持此功能。 As you mentioned, "out-of-order" is not cuncurrent, is just a out of order execution of the queue. 正如您提到的,“乱序”不是并发的,只是队列的乱序执行。

But what is the point in running both of them in parallel at half the speed instead of sequentially at full speed? 但是,以一半的速度并行运行而不是全速运行它们又有什么意义呢? You will probably loose overall performance if you do it in such a way. 如果这样做,可能会降低整体性能。

I recomend you to use more GPU devices (or GPU + CPU) if you run out of resources in one of the GPUs. 如果您用尽其中一个GPU的资源,建议您使用更多的GPU设备(或GPU + CPU)。 Optimizing could be a good option too. 优化也可能是一个不错的选择。 But splitting is never a good option for real scenario, only for academic purposes or testing. 但是对于实际情况,拆分绝不是一个好的选择,仅用于学术目的或测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM