I am training a model in Keras with IMDB dataset. For this model with LSTM layer, the accuracy is about 50%:
model = Sequential()
model.add(Embedding(max_features, 32))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))
Accuracy:
loss: 0.6933 - acc: 0.5007 - val_loss: 0.6932 - val_acc: 0.4947
I have also tried with a single LSTM layer but it also gives similar accuracy.
However, if I don't use LSTM layer the accuracy reaches to around 82%
model = models.Sequential()
model.add(layers.Dense(16, kernel_regularizer=regularizers.l1(0.001), activation='relu', input_shape=(10000,)))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(16, kernel_regularizer=regularizers.l1(0.001), activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='sigmoid'))
Accuracy:
loss: 0.6738 - acc: 0.8214 - val_loss: 0.6250 - val_acc: 0.8320
This is how I compile and fit the model:
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
model.fit(partial_x_train, partial_y_train, epochs=Numepochs, batch_size=Batchsize, validation_data=(x_val, y_val))
How can this be explained? I thought LSTM works great for sequential text data?
Don't forget that LSTM is used for processing sequences such as timeseries or text data. In a sequence the order of elements is very important and if you reorder the element then the whole meaning of that sequence might completely change.
Now the problem in your case is that the preprocessing step you have used is not the proper one for a LSTM model. You are encoding each sentence as a vector where each of its elements represents the presence or absence of particular word. Therefore, you are completely ignoring the order of appearance of words in a sentence, which LSTM layer is good at modeling it. There is also another issue in your LSTM model, considering the preprocessing scheme you have used, which is the fact that Embedding layer accepts word indices as input and not a vector of zero and ones (ie the output of the preprocessing stage).
Since the IMDB data is already stored as sequences of word indices, to resolve this issue you just need to preprocess the IMDB data by only padding/truncating the sequences with a specified length to be able to utilize batch processing. For example:
from keras.preprocessing.sequences import pad_sequences
vocab_size = 10000 # only consider the 10000 most frequent words
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)
x_train = pad_sequences(x_train, maxlen=500) # truncate or pad sequences to make them all have a length of 500
Now, x_train
would have a shape of (25000, 500)
and it consists of 25000 sequences of length 500, encoded as integer word indices. Now you can use it for training by passing it to fit
method. I guess you can reach at least 80% training accuracy with an Embedding layer and a single LSTM layer. Don't forget that to use a validation scheme to monitor overfitting (one simple option is to set validation_split
argument when calling fit
method).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.