简体   繁体   中英

LSTM for reversing integer sequence

I'm just trying to train a LSTM to reverse a integer sequence. My approach is a modified version of this tutorial, in which he just echoes the input sequence. It goes like this:

  1. Generate a random sequence S with length R (possible values range from 0 to 99)
  2. Break the sentence above in sub-sequences of lenght L (moving window)
  3. Each sub-sequence has it's reverse as truth label

So, this will generate (R - L + 1) sub-sequences, which is a input matrix of shape (R - L + 1) x L. For example, using:

S = 1 2 3 4 5 ... 25 (1 to 25)
R = 25
L = 5 

We endup with 21 sentences:

s1 = 1 2 3 4 5, y1 = 5 4 3 2 1
s2 = 2 3 4 5 6, y2 = 6 5 4 3 2
...
s21 = 21 22 23 24 25, y21 = 25 24 23 22 21

This input matrix is then one-hot-encoded and feed to keras. Then I repeat the proccess for another sequence. The problem is that it does not converge, the accuracy is very low. What I'm doing wrong?

In the code below I use R = 500 and L = 5, which gives 496 sub-sequences, with batch_size = 16 (so we have 31 updates per 'training session'):

Here's the code:

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import TimeDistributed
from keras.layers import LSTM
from random import randint
from keras.utils.np_utils import to_categorical
import numpy as np

def one_hot_encode(sequence, n_unique=100):
    encoding = list()
    for value in sequence:
        vector = [0 for _ in range(n_unique)]
        vector[value] = 1
        encoding.append(vector)
    return np.array(encoding)

def one_hot_decode(encoded_seq):
    return [np.argmax(vector) for vector in encoded_seq]

def get_data(rows = 500, length = 5, n_unique=100):
    s = [randint(0, n_unique-1) for i in range(rows)]
    x = []
    y = []

    for i in range(0, rows-length + 1, 1):
        x.append(one_hot_encode(s[i:i+length], n_unique))
        y.append(one_hot_encode(list(reversed(s[i:i+length])), n_unique))

    return np.array(x), np.array(y)

N = 50000
LEN = 5
#ROWS = LEN*LEN - LEN + 1
TIMESTEPS = LEN
ROWS = 10000
FEATS = 10 #randint
BATCH_SIZE = 588

# fit model
model = Sequential()
model.add(LSTM(100, batch_input_shape=(BATCH_SIZE, TIMESTEPS, FEATS), return_sequences=True, stateful=True))
model.add(TimeDistributed(Dense(FEATS, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

print(model.summary())

# train LSTM
for epoch in range(N):
    # generate new random sequence
    X,y = get_data(500, LEN, FEATS)
    # fit model for one epoch on this sequence
    model.fit(X, y, epochs=1, batch_size=BATCH_SIZE, verbose=2, shuffle=False)
    model.reset_states()

# evaluate LSTM 
X,y = get_data(500, LEN, FEATS)
yhat = model.predict(X, batch_size=BATCH_SIZE, verbose=0)

# decode all pairs
for i in range(len(X)):
    print('Expected:', one_hot_decode(y[i]), 'Predicted', one_hot_decode(yhat[i]))

Thanks!

Edit: It seems like the last numbers of the sequence are being picked up:

Expected: [7, 3, 7, 7, 6] Predicted [3, 9, 7, 7, 6]
Expected: [6, 7, 3, 7, 7] Predicted [4, 6, 3, 7, 7]
Expected: [6, 6, 7, 3, 7] Predicted [4, 3, 7, 3, 7]
Expected: [1, 6, 6, 7, 3] Predicted [3, 3, 6, 7, 3]
Expected: [8, 1, 6, 6, 7] Predicted [4, 3, 6, 6, 7]
Expected: [8, 8, 1, 6, 6] Predicted [3, 3, 1, 6, 6]
Expected: [9, 8, 8, 1, 6] Predicted [3, 9, 8, 1, 6]
Expected: [5, 9, 8, 8, 1] Predicted [3, 3, 8, 8, 1]
Expected: [9, 5, 9, 8, 8] Predicted [7, 7, 9, 8, 8]
Expected: [0, 9, 5, 9, 8] Predicted [7, 9, 5, 9, 8]
Expected: [7, 0, 9, 5, 9] Predicted [5, 7, 9, 5, 9]
Expected: [1, 7, 0, 9, 5] Predicted [7, 9, 0, 9, 5]
Expected: [9, 1, 7, 0, 9] Predicted [5, 9, 7, 0, 9]
Expected: [4, 9, 1, 7, 0] Predicted [6, 3, 1, 7, 0]
Expected: [4, 4, 9, 1, 7] Predicted [4, 3, 9, 1, 7]
Expected: [0, 4, 4, 9, 1] Predicted [3, 9, 4, 9, 1]
Expected: [1, 0, 4, 4, 9] Predicted [5, 5, 4, 4, 9]
Expected: [3, 1, 0, 4, 4] Predicted [3, 3, 0, 4, 4]
Expected: [0, 3, 1, 0, 4] Predicted [3, 3, 1, 0, 4]
Expected: [2, 0, 3, 1, 0] Predicted [6, 3, 3, 1, 0]

The first thing that may be causing problems to your model is using stateful=True .

This option is only necessary when you want to divide one sequence in many parts of it, such as the sequences in the second batch are the continuation of the ones in the first batch. This is useful when your sequence is long enough to cause memory issues, then you divide it.

And it will demand you to "erase memory" (called "reset states") manually after you pass the last batch of a sequence.


Now, LSTM layers are not good for that task, because they work in the following manner:

  • Start with a clear "state" (can be roughly inteprepted as a clear memory). It's a whole new sequence coming in;
  • Take the first step/element in the sequence, calculate the first result and update memory;
  • Take the second step in the sequence, calculate the second result (now with help from their own memory and the previous results) and update memory;
  • And so on until the last step.

You can see a better explanation here . This explanation focuses more exact details, but it has nice pictures such as this one:

LSTM步骤图

Imagine a sequence with 3 elements. In this picture, X(t-1) is the first element. H(t-1) is the first result. X(t) and H(t) are the second input and output. X(t+1) and H(t+1) are the last input and output. They're processed in sequence.

So, the memory/state of the layer simply doesn't exist at the first step. As it receives the first number, it can't have the slightest idea of what the last number is, because it has never seen the last number. (Maybe if the sequences were somehow understandable, if the numbers had a relation betwen themselves, then there would be a better chance for it to output logical results, but these sequences are just random).

Now, as you approach the last number, then the layer has already built its memory of the sequence, and there is a chance that it knows what to do (because it has already seen the first numbers).

That explains your results:

  • Up to the first half of the sequence, it's trying to output numbers that it has never seen (and that don't have any logical relation betweem themselves).
  • From the central number towards the end, all the previous numbers are seen and can be predicted.

The Bidirectional layer wrapper:

If it's important that the first steps be influenced by the last ones, you will probably need the Bidirectional layer wrapper. It makes your LSTM be processed in both ways, and duplicates the number of output features. If you pass 100 cells, it will output (Batch, Steps, 200) . Pretty much as having two LSTM layers, one of them reading the inputs backwards.

model.add(Bidirectional(LSTM(100, return_sequences=True), input_shape=(TIMESTEPS, FEATS)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM