指定尺寸時，torch.max 使用 GPU 比使用 CPU 慢

Question

t1_h = torch.tensor(np.arange(100000), dtype=torch.float32)
cuda0 = torch.device('cuda:0')
t1_d = torch.tensor(np.arange(100000), dtype=torch.float32, device = cuda0)

%timeit -n 10000 max_h = torch.max(t1_h, 0)
%timeit -n 10000 max_d = torch.max(t1_d, 0)

10000 個循環，最好的 3 個：每個循環 144 µs

10000 個循環，最好的 3 個：每個循環 985 µs

正如你在上面看到的，GPU 比 CPU 花費更多的時間。 但是如果我沒有指定計算最大值的維度，那么 GPU 會更快。

%timeit -n 10000 max_h = torch.max(t1_h)
%timeit -n 10000 max_d = torch.max(t1_d)

10000 個循環，最好的 3 個：每個循環 111 µs

10000 個循環，最好的 3 個：每個循環 41.8 µs

我也嘗試過使用argmax而不是max但它工作正常（GPU 比 CPU 快）。

%timeit -n 10000 cs_h = torch.argmax(t1_h, 0)
%timeit -n 10000 cs_d = torch.argmax(t1_d, 0)

10000 個循環，最好的 3 個：每個循環 108 µs

10000 個循環，最好的 3 個：每個循環 18.1 µs

指定尺寸后， torch.max在 GPU 上torch.max緩慢有什么原因嗎？

Answer 1

我自己發現了這一點，並在 PyTorch 中打開了一個問題。 看起來很快就會修復 - 也許是 1.5 或 1.6 版？ - 但與此同時，建議的解決方法是使用

ii=a.argmax(0)
maxval = a.gather(0, ii.unsqueeze(0)).squeeze(0)

指定尺寸時，torch.max 使用 GPU 比使用 CPU 慢

問題描述

1 個解決方案

解決方案1
1 2020-04-20 06:42:00

指定尺寸時，torch.max 使用 GPU 比使用 CPU 慢

問題描述

1 個解決方案

解決方案1 1 2020-04-20 06:42:00

解決方案1
1 2020-04-20 06:42:00