简体繁体中英

performance of thrust vs. cublas

原文 2015-10-06 08:11:25 3 1 c++/ cuda/ thrust

I have an std::vector of matrices of different sizes and I am going to calculate the square of every matrix. I have two solutions :

1/ Flatten all my matrices, and store them in the device as a huge flat array (float *), with indices of beginning and end of each matrix in that array, and use cublas for example to do the squaring.

2/ store the matrices in a thrust::device_vector<float *> and use thrust::for_each to square them.

Clearly the second solution gives more readable code, but does it impact performance?

1 answers

I think this is (now) just a repeat of a question you already asked .

Assuming the elementwise operation you want to do is something simple like squaring of each element, there should be little difference in performance or efficiency between the two cases.

This is because such an operation will be memory-bound, meaning its performance will be limited by (GPU) memory bandwidth. Therefore both realizations will have approximately the same limiter, and approximately the same performance.

Note that in both of your proposals, the data will ultimately need to be effectively "flattened" in the same way (thrust operations cannot be constructed in a typical or simple fashion to operate on a thrust::device_vector<float *> )

If you already have a mix of thrust and CUBLAS, for example, then you could probably use whichever approach suited you. If, on the other hand, your module used only CUBLAS, and you could realize your operation using either CUBLAS or thrust, I'm not sure I would inject thrust just for this one operation. But that's just a matter of opinion.

Using cuBLAS with complex numbers from Thrust

Mixing Thrust and cuBLAS unexpected results in output

strftime performance vs. snprintf

performance of array vs. map

`std::variant` vs. inheritance vs. other ways (performance)

CUDA Thrust performance

performance tuning a thrust application

How to compute complex vectors' inner product using cublas or thrust?

Performance AVX/SSE assembly vs. intrinsics

C++ style vs. performance?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Using cuBLAS with complex numbers from Thrust Mixing Thrust and cuBLAS unexpected results in output strftime performance vs. snprintf performance of array vs. map `std::variant` vs. inheritance vs. other ways (performance) CUDA Thrust performance performance tuning a thrust application How to compute complex vectors' inner product using cublas or thrust? Performance AVX/SSE assembly vs. intrinsics C++ style vs. performance?

Related Tags

performance of thrust vs. cublas

Question

1 answers

solution1 3 ACCPTED 2015-10-06 14:24:54

solution1
3 ACCPTED 2015-10-06 14:24:54