简体   繁体   English

推力与方肌的表现

[英]performance of thrust vs. cublas

I have an std::vector of matrices of different sizes and I am going to calculate the square of every matrix. 我有一个不同大小的矩阵的std::vector ,我将计算每个矩阵的平方。 I have two solutions : 我有两种解决方案:

1/ Flatten all my matrices, and store them in the device as a huge flat array (float *), with indices of beginning and end of each matrix in that array, and use cublas for example to do the squaring. 1 /展平我的所有矩阵,并将它们作为一个巨大的平面数组(浮点*)存储在设备中,并带有该数组中每个矩阵的开始和结束的索引,例如使用cublas进行平方。

2/ store the matrices in a thrust::device_vector<float *> and use thrust::for_each to square them. 2 /将矩阵存储在thrust::device_vector<float *>并使用thrust::for_each将它们平方。

Clearly the second solution gives more readable code, but does it impact performance? 显然,第二种解决方案提供了更具可读性的代码,但这会影响性能吗?

I think this is (now) just a repeat of a question you already asked . 我认为(现在)只是您已经提出的一个问题的重复。

Assuming the elementwise operation you want to do is something simple like squaring of each element, there should be little difference in performance or efficiency between the two cases. 假设要执行的元素操作很简单,例如对每个元素进行平方运算,那么这两种情况在性能或效率上应该没有什么区别。

This is because such an operation will be memory-bound, meaning its performance will be limited by (GPU) memory bandwidth. 这是因为此类操作将受内存限制,这意味着其性能将受到(GPU)内存带宽的限制。 Therefore both realizations will have approximately the same limiter, and approximately the same performance. 因此,两个实现将具有大约相同的限制器和大约相同的性能。

Note that in both of your proposals, the data will ultimately need to be effectively "flattened" in the same way (thrust operations cannot be constructed in a typical or simple fashion to operate on a thrust::device_vector<float *> ) 请注意,在您的两个建议中,最终都需要以相同的方式有效地“整理”数据(推力操作无法以典型或简单的方式构造为对thrust::device_vector<float *>

If you already have a mix of thrust and CUBLAS, for example, then you could probably use whichever approach suited you. 例如,如果您已经混合使用推力和CUBLAS,则可以使用任何适合您的方法。 If, on the other hand, your module used only CUBLAS, and you could realize your operation using either CUBLAS or thrust, I'm not sure I would inject thrust just for this one operation. 另一方面,如果您的模块仅使用CUBLAS,并且您可以使用CUBLAS或推力来实现您的操作,那么我不确定我是否会为此操作注入推力。 But that's just a matter of opinion. 但这只是一个见解。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM