计算CUFFT的性能

Question

I am running CUFFT on chunks (N*N/p) divided in multiple GPUs, and I have a question regarding calculating the performance. 我在划分为多个GPU的块（N * N / p）上运行CUFFT，我对计算性能有疑问。 First, a bit about how I am doing it: 首先，关于我的做法：

Send N*N/p chunks to each GPU 将N * N / p个块发送到每个GPU
Batched 1-D FFT for each row in p GPUs p个GPU中每一行的批量一维FFT
Get N*N/p chunks back to host - perform transpose on the entire dataset 将N * N / p个块获取回主机-对整个数据集执行转置
Ditto Step 1 同上步骤1
Ditto Step 2 同上步骤2

Gflops = ( 1e-9 * 5 * N * N *lg(N*N) ) / execution time

and Execution time is calculated as: 执行时间的计算公式为：

execution time = Sum(memcpyHtoD + kernel + memcpyDtoH times for row and col FFT for each GPU)

Is this the correct way to evaluate CUFFT performance on multiple GPUs? 这是评估多个GPU上CUFFT性能的正确方法吗？ Is there any other way I could represent the performance of FFT? 还有其他方法可以代表FFT的性能吗？

Thanks. 谢谢。

Answer 1

If you are doing a complex transform, the operation count is correct (it should be 2.5 N log2(N) for a real valued transform), but the GFLOP formula is incorrect. 如果要执行复杂的转换，则操作计数是正确的（对于实值转换，它应该为2.5 N log2（N）），但是GFLOP公式不正确。 In a parallel, multiprocessor operation the usual calculation of throughput is 在并行的多处理器操作中，通常的吞吐量计算为

operation count / wall clock time

In your case, presuming the GPUs are operating in parallel, either measure the wall clock time (ie. how long the whole operation took) for the execution time, or use this: 在您的情况下，假设GPU并行运行，请测量执行时间的挂钟时间（即整个操作花费了多长时间），或使用以下方法：

execution time = max(memcpyHtoD + kernel + memcpyDtoH times for row and col FFT for each GPU)

As it stands, your calculation represents the serial execution time. 就目前而言，您的计算代表了串行执行时间。 Allowing for the overheads from the multigpu scheme, I would expect that the calculated performance numbers you are getting will be lower than the equivalent transform done on a single GPU. 考虑到multigpu方案的开销，我希望您所获得的计算出的性能数字将低于在单个GPU上完成的等效转换。

计算CUFFT的性能

问题描述

1 个解决方案

解决方案1
2 已采纳 2012-02-18 06:18:12

计算CUFFT的性能

问题描述

1 个解决方案

解决方案1 2 已采纳 2012-02-18 06:18:12

解决方案1
2 已采纳 2012-02-18 06:18:12