简体   繁体   中英

How to use CNN and LSTM for NLP with BERT embeddings?

I have the imdb dataset translated into BERT embeddings with dimension 768. So I have 40000 samples of 768 features for training and 10000 samples for validating with the same size of 768. I have tried out dense layers and I have reached about a 85% accuracy with not much problems. But, when it comes to use convs layers and lstms layers I have dimension problems. Another missing concept I have is that I do not know the correr order of those layers. I assume that it should be right at the entrance of the model since those layers are capable of capturing time dependencies and with BERT embeddings those dependencies are learnt. I am using a batch size of 200 reviews. Thanks in advance for any kind of clarification.

I believe that since I am already using BERT embeddings I do not need an input layer with Embeddings type but I am not sure of this, eaither.

DISCLAIMER: After some experiments, I think that One does not need a LSTM layer, nor a CNN. Classification should be done with dense because the embeddings should bring all the contextual information. By that time I was not wise enough (still not being) and maybe I was just trying different approaches

My input training set:

np.array(x_train).shape
(40000, 768)

And the model I'm using

import keras
from keras import models
from keras.models import Sequential
from keras import layers
from keras.layers import Embedding, Bidirectional, Dense, LSTM, Conv1D, MaxPooling1D, Flatten, Reshape, TimeDistributed
from keras import optimizers

reduce_lr = keras.callbacks.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.50, patience=2, verbose=1, mode='auto', cooldown=0, min_lr=0.00001)

early = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=1, mode='auto')

model1 = Sequential()

# CONVS + POOLING LAYERS or RECURRENT LAYERS
# CONVS + POOLING LAYERS or RECURRENT LAYERS

model1.add(Dense(526, activation='relu'))
model1.add(Dense(128, activation='relu'))
model1.add(Dense(64, activation='relu'))


model1.add(Dense(2, activation='softmax'))
model1.summary()

adam = optimizers.Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.999, amsgrad=False)

model1.compile(loss='sparse_categorical_crossentropy',
              optimizer=adam,
              metrics=['accuracy'])

history  = model1.fit(np.array(x_train), np.array(y_train),
 epochs= 20,
 batch_size = 200,
 validation_data = (np.array(x_val), np.array(y_val)), callbacks = [reduce_lr, early]
)

Conv1D and LSTM require 3D data.

np.array(x_train).shape (40000, 768)

The correct shape should be (40000, 768, 1)

So, reshape your array.

x_train = x_train.reshpae(-1, 768, 1)

model1 = Sequential()

model1.add(Conv1D(128, 3, activation='relu')) # input_shape = (768,1)
model1.add(Conv1D(256, 3,  activation='relu'))
# flat
model1.add(Flatten())

model1.add(Dense(2, activation='softmax'))
model1.summary()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM