[英]PyTorch RNN is more efficient with `batch_first=False`?
In machine translation, we always need to slice out the first timestep (the SOS token) in the annotation and prediction.在机器翻译中,我们总是需要在注释和预测中切出第一个时间步(SOS 标记)。
When using batch_first=False
, slicing out the first timestep still keeps the tensor contiguous.使用
batch_first=False
,切出第一个时间步仍然保持张量连续。
import torch
batch_size = 128
seq_len = 12
embedding = 50
# Making a dummy output that is `batch_first=False`
batch_not_first = torch.randn((seq_len,batch_size,embedding))
batch_not_first = batch_first[1:].view(-1, embedding) # slicing out the first time step
However, if we use batch_first=True
, after slicing, the tensor is no longer contiguous.但是,如果我们使用
batch_first=True
,切片后,张量不再连续。 We need to make it contiguous before we can do different operations such as view
.我们需要使其连续,然后才能进行不同的操作,例如
view
。
batch_first = torch.randn((batch_size,seq_len,embedding))
batch_first[:,1:].view(-1, embedding) # slicing out the first time step
output>>>
"""
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-8-a9bd590a1679> in <module>
----> 1 batch_first[:,1:].view(-1, embedding) # slicing out the first time step
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
"""
Does that mean batch_first=False
is better, at least, in the context of machine translation?这是否意味着
batch_first=False
至少在机器翻译的上下文中更好? Since it saves us from doing the contiguous()
step.因为它使我们免于执行
contiguous()
步骤。 Is there any cases that batch_first=True
works better?是否有任何情况下
batch_first=True
效果更好?
There doesn't seem to be a considerable difference between batch_first=True
and batch_first=False
. batch_first=True
和batch_first=False
之间似乎没有太大区别。 Please see the script below:请看下面的脚本:
import time
import torch
def time_measure(batch_first: bool):
torch.cuda.synchronize()
layer = torch.nn.RNN(10, 20, batch_first=batch_first).cuda()
if batch_first:
inputs = torch.randn(100000, 7, 10).cuda()
else:
inputs = torch.randn(7, 100000, 10).cuda()
start = time.perf_counter()
for chunk in torch.chunk(inputs, 100000 // 64, dim=0 if batch_first else 1):
_, last = layer(chunk)
return time.perf_counter() - start
print(f"Time taken for batch_first=False: {time_measure(False)}")
print(f"Time taken for batch_first=True: {time_measure(True)}")
On my device (GTX 1050 Ti), PyTorch 1.6.0
and CUDA 11.0 here are the results:在我的设备 (GTX 1050 Ti)、PyTorch
1.6.0
和 CUDA 11.0 上,结果如下:
Time taken for batch_first=False: 0.3275816479999776
Time taken for batch_first=True: 0.3159054920001836
(and it varies either way so nothing conclusive). (而且它以任何一种方式变化,所以没有定论)。
batch_first=True
is simpler when you want to use other PyTorch layers which require batch
as 0
th dimension (which is the case for almost all torch.nn
layers like torch.nn.Linear
).当您想使用其他需要
batch
作为第0
维的 PyTorch 层时, batch_first=True
会更简单(几乎所有torch.nn
层如torch.nn.Linear
都是这种情况)。
In this case you would have to permute
returned tensor anyway if batch_first=False
was specified.在这种情况下,如果指定了
batch_first=False
则无论如何都必须permute
返回的张量。
It should be better as the tensor
is contiguous all the time and no copy of data has to be done.它应该会更好,因为
tensor
始终是连续的,并且不必进行数据复制。 It also looks cleaner to slice using [1:]
instead of [:,1:]
.使用
[1:]
而不是[:,1:]
切片看起来也更干净。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.