简体   繁体   中英

performance of thrust vs. cublas

I have an std::vector of matrices of different sizes and I am going to calculate the square of every matrix. I have two solutions :

1/ Flatten all my matrices, and store them in the device as a huge flat array (float *), with indices of beginning and end of each matrix in that array, and use cublas for example to do the squaring.

2/ store the matrices in a thrust::device_vector<float *> and use thrust::for_each to square them.

Clearly the second solution gives more readable code, but does it impact performance?

I think this is (now) just a repeat of a question you already asked .

Assuming the elementwise operation you want to do is something simple like squaring of each element, there should be little difference in performance or efficiency between the two cases.

This is because such an operation will be memory-bound, meaning its performance will be limited by (GPU) memory bandwidth. Therefore both realizations will have approximately the same limiter, and approximately the same performance.

Note that in both of your proposals, the data will ultimately need to be effectively "flattened" in the same way (thrust operations cannot be constructed in a typical or simple fashion to operate on a thrust::device_vector<float *> )

If you already have a mix of thrust and CUBLAS, for example, then you could probably use whichever approach suited you. If, on the other hand, your module used only CUBLAS, and you could realize your operation using either CUBLAS or thrust, I'm not sure I would inject thrust just for this one operation. But that's just a matter of opinion.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM