简体   繁体   中英

How does one padd a tensor of 3 dimensions in Pytorch?

I was trying to use the built in padding function but it wasn't padding things for me for some reason. This is my reproducible code:

import torch

def padding_batched_embedding_seq():
    ## 3 sequences with embedding of size 300
    a = torch.ones(1, 4, 5) # 25 seq len (so 25 tokens)
    b = torch.ones(1, 3, 5) # 22 seq len (so 22 tokens)
    c = torch.ones(1, 2, 5) # 15 seq len (so 15 tokens)
    ##
    sequences = [a, b, c]
    batch = torch.nn.utils.rnn.pad_sequence(sequences)

if __name__ == '__main__':
    padding_batched_embedding_seq()

error message:

Traceback (most recent call last):
  File "padding.py", line 51, in <module>
    padding_batched_embedding_seq()
  File "padding.py", line 40, in padding_batched_embedding_seq
    batch = torch.nn.utils.rnn.pad_sequence(sequences)
  File "/Users/rene/miniconda3/envs/automl/lib/python3.7/site-packages/torch/nn/utils/rnn.py", line 376, in pad_sequence
    out_tensor[:length, i, ...] = tensor
RuntimeError: The expanded size of the tensor (4) must match the existing size (3) at non-singleton dimension 1.  Target sizes: [1, 4, 5].  Tensor sizes: [3, 5]

any idea?


cross posted: https://discuss.pytorch.org/t/how-does-one-padd-a-tensor-of-3-dimensions/51097

You sould have torch.ones(2, 5) instead or torch.ones(2, ...) where ... are the same dimention for each sample. RuntimeError: The expanded size of the tensor (4) must match the existing size (3) at non-singleton dimension 1. Target sizes: [1, 4, 5]. Tensor sizes: [3, 5] stands for it expects all dimension other the first ~ dim == 0 to be the same because the first one is variable seq length and others are for input item that is same.

The example from doc https://pytorch.org/docs/stable/_modules/torch/nn/utils/rnn.html is :

 >>> from torch.nn.utils.rnn import pad_sequence
    >>> a = torch.ones(25, 300)
    >>> b = torch.ones(22, 300)
    >>> c = torch.ones(15, 300)
    >>> pad_sequence([a, b, c]).size()

Wtih output: torch.Size([25, 3, 300])

With shape :(max_sequence len, batch_size, single_input) because of batch_first=False by default , but i prefer batch_first=True with shape torch.Size([3, 25, 300]) then.

Pad just means fill zeroes until it matches max sequence len. As input in RNN you may prefer packed sequence what contains no zero inputs.

So in your example if input has more dimms it will be like

 a = torch.ones(4, 5, 10) # 5*10 2d input,  sequence of length 4 for them
    b = torch.ones(3, 5, 10) 
    c = torch.ones(2, 5, 10)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM