简体   繁体   中英

LSTM-CNN to classify sequences of images

I got an assignment and stuck with it while going down the rabbit hole of learning PyTorch, LSTM and cnn. Provided the well known MNIST library I take combinations of 4 numbers and per combination it falls down into one of 7 labels.

eg: 1111 label 1 (follow a constant trend)
1234 label 2 increasing trend
4321 label 3 decreasing trend
...
7382 label 7 decreasing trend - increasing trend - decreasing trend

The shape of my tensor after loading of the tensor become (3,4,28,28) where the 28 comes from the MNIST image's width and height. 3 is the batch size and 4 is the channels (4 images).

I'm somewhat stuck with how to pass this into a PyTorch backed LSTM and CNN as basically all Google searches lead to articles where simply one image is passed in.

I was thinking of reshaping it to 1 long array of (pixel values) where I put all of the values of the first image row by row (28) after each other, then appended by the same approach for the second, third and fourth image. So that would make 4 * 28 * 28 = 3136.

Is my way of thinking on how to tackle this a correct one or should I rethink? I'm rather new to this all and looking for some guidance on how to go forward. I've been reading loads of articles, YT videos, ... but all seem to touch the basic stuff or alternatives of the same subject.

I have written some code but running it gives errors.

import numpy as np
import torch
import torch.nn as nn
from torch import optim, softmax
from sklearn.model_selection import train_test_split

#dataset = sequences of 4 MNIST images each
#datalabels =7

#Data
x_train, x_test, y_train, y_test = train_test_split(dataset.data, dataset.data_label, test_size=0.15,
                                                    random_state=42)
#model
class Mylstm(nn.Module):
    def __init__(self, input_size, hidden_size, n_layers, n_classes):
        super(Mylstm, self).__init__()
        self.input_size = input_size
        self.n_layers = n_layers
        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(input_size, hidden_size, n_layers, batch_first=True)
        # readout layer
        self.fc = nn.Linear(hidden_size, n_classes)

    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = torch.zeros(self.n_layers, x.size(0), self.hidden_size).requires_grad_()
        # initialize the cell state:
        c0 = torch.zeros(self.n_layers, x.size(0), self.hidden_size).requires_grad_()
        out, (h_n, h_c) = self.lstm(x, (h0.detach(), c0.detach()))
        x = h_n[-1, :, 1]  
        x = self.fc(x)
        x = softmax(x, dim=1)
        return x

#Hyperparameters
input_size = 28
hidden_size = 256
sequence_length = 28
n_layers = 2
n_classes = 7
learning_rate = 0.001
model = Mylstm(input_size, hidden_size, n_layers, n_classes)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)


#training
bs = 0
num_epochs = 5
batch_size=3

if np.mod(x_train.shape[0], batch_size) == 0.0:
    iter = int(x_train.shape[0] / batch_size)
else:
    iter = int(x_train.shape[0] / batch_size) + 1
bs = 0
for i in range(iter):
    sequences = x_test[bs:bs + batch_size, :]
    labels = y_test[bs:bs + batch_size]
    test_images = dataset.load_images(sequences)
    bs += batch_size

for epoch in range(num_epochs):
    for i in range(iter):
        sequences = x_train[bs:bs + batch_size, :]
        labels = y_train[bs:bs + batch_size]
        input_images = dataset.load_images(sequences)
        bs += batch_size
        images=(torch.from_numpy(input_images)).view(batch_size,4,-1)
        labels=torch.from_numpy(labels)
        optimizer.zero_grad()
        output = model(images)
        # calculate Loss
        loss = criterion(output, labels)
        loss.backward()
        optimizer.step()

The error I'm currently getting is:

RuntimeError: input.size(-1) must be equal to input_size. Expected 28, got 784

Change your input size from 28 to 784. (784=28*28).

Input size argument is the number of features in one element of the sequence, so the number of feature of an mnist image, so the number of pixels which is width*hight of the image.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM