Pytorch,如何将CNN的output馈入RNN的输入？

Question

I am new to CNN, RNN and deep learning.我是 CNN、RNN 和深度学习的新手。 I am trying to make architecture that will combine CNN and RNN.我正在尝试制作将 CNN 和 RNN 结合起来的架构。 input image size = [20,3,48,48] a CNN output size = [20,64,48,48] and now i want cnn ouput to be RNN input but as I know the input of RNN must be 3-dimension only which is [seq_len, batch, input_size] How can I make 4-dimensional [20,64,48,48] tensor into 3-dimensional for RNN input?输入图像大小 = [20,3,48,48] a CNN output 大小 = [20,64,48,48] 现在我希望 cnn 输出为 RNN 输入但据我所知 RNN 的输入必须是 3 维只有 [seq_len, batch, input_size] 如何将 4 维 [20,64,48,48] 张量转换为 3 维 RNN 输入张量？

and another question how do I initiate the first hidden state with另一个问题是如何启动第一个隐藏的 state

torch.zeros()

I don't know what exact information I should pass in this function. the only thing that I know it is我不知道我应该在这个 function 中传递什么确切信息。我唯一知道的是

[layer_dim, batch, hidden_dim]

Thank you.谢谢你。

Answer 1

I assume that 20 here is size of a batch.我假设这里的20是一个批次的大小。 In that case, set batch = 20 .在这种情况下，设置batch = 20 。

seq_len is the number of time steps in each stream. Since one image is input at one time step, seq_len = 1 . seq_len是每个 stream 中的时间步数。由于在一个时间步输入一张图像， seq_len = 1 。

Now, 20 images of size (64, 48, 48) has to be converted for the format现在，必须将20张大小为(64, 48, 48)的图像转换为格式

Since the size of input is (64, 48, 48), input_size = 64 * 48 * 48由于输入的大小为 (64, 48, 48)， input_size = 64 * 48 * 48

model = nn.LSTM(input_size=64*48*48, hidden_size=1).to(device)

#Generating input - 20 images of size (60, 48, 48)
cnn_out = torch.randn((20, 64, 48, 48)).requires_grad_(True).to(device)

#To pass it to LSTM, input must be of the from (seq_len, batch, input_size)
cnn_out = cnn_out.view(1, 20, 64*48*48)

model(cnn_out)

This will give you the result.这会给你结果。

Answer 2

By following @Arun soulution.通过关注@Arun 解决方案。 Finally I can pass image tensor though RNN Layer But the problem after is some how pytorch need the first hidden state as [1, 1, 1] only.最后我可以通过 RNN 层传递图像张量但是之后的问题是 pytorch 如何只需要第一个隐藏的 state 作为 [1, 1, 1]。 I don't know why.我不知道为什么。 And now my output of RNN is [1, 20, 1].而现在我的RNN的output是[1,20,1]。 I thought my output will be [1, 20, 147456].我以为我的 output 会是 [1, 20, 147456]。 So I can reshape my output shape back to Image input shape [20, 64, 48, 48]所以我可以将我的 output 形状重塑回图像输入形状 [20, 64, 48, 48]

class Rnn(nn.Module):
    def __init__(self):
        super(Rnn, self).__init__()
        self.rnn = nn.RNN(64*48*48, 1, 1, batch_first=True, nonlinearity='relu')

    def forward(self, x):
        batch_size = x.size(0)
        hidden = self.init_hidden(batch_size)
        images = x.view(1, 20, 64*48*48)
        out, hidden = self.rnn(images, hidden)
        out = torch.reshape(out, (20,64,96,96))
        return out, hidden

    def init_hidden(self, batch_size):
        hidden = torch.zeros(1, 1, 1).to(device)
        return hidden

Answer 3

Your question is very interesting.The output of CNN is 4 dimension,but the input of RNN require 3 dimension.你的问题很有意思，CNN的output是4维的，而RNN的输入需要3维的。

Obviourly, you know the meaning of dimension.显然，您知道维度的含义。 The problem is samply be shape operation.问题很可能是形状操作。

Pytorch,如何将CNN的output馈入RNN的输入？

问题描述

3 个解决方案

解决方案1
0 2020-06-11 14:12:58

解决方案2
0 2020-06-12 06:02:31

解决方案3
-2 2020-06-11 07:35:37

Pytorch,如何将CNN的output馈入RNN的输入？

问题描述

3 个解决方案

解决方案1 0 2020-06-11 14:12:58

解决方案2 0 2020-06-12 06:02:31

解决方案3 -2 2020-06-11 07:35:37

解决方案1
0 2020-06-11 14:12:58

解决方案2
0 2020-06-12 06:02:31

解决方案3
-2 2020-06-11 07:35:37