I create a tf dataset from a tokenized text converted to sequences then numpy arrays
tokenizer = Tokenizer()
tokenizer.fit_on_texts(bible_text)#Builds the word index
sequences = tokenizer.texts_to_sequences(bible_text)
##-->[[5, 1, 914, 32, 1352, 1, 214, 2, 1, 111],
## [2, 1, 111, 31, 252, 2091, 2, 1874, 2, 547, 31, 38, 1, 196, 3, 1, 899, 2, 1, 298, 3, 32, 878, 38, 1, 196, 3, 1, 266],
## [2, 32, 33, 79, 54, 16, 369, 2, 54, 31, 369], [2, 32, 215, 1, 369, 6, 17, 31, 156, 2, 32, 955, 1, 369, 34, 1, 547], ...]
sequences=pad_sequences(sequences, padding='post')
##-->[[ 5 1 914 32 1352 1 214 2 1 111 0 0 0 0
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 0 0 0 0 0 0]
##...]
word_index=tokenizer.word_index
##for k,v in sorted(word_index.items(), key=operator.itemgetter(1))[:10]:
## print (k,v)
##--> the 1
##and 2
##of 3
##to 4
##in 5
##that 6
##shall 7
##he 8
##lord 9
##his 10
##
##[...]
vocab_size = len(tokenizer.word_index) + 1
building input and target sequences
input_sequences, target_sequences = sequences[:,:-1], sequences[:,1:]
seq_length=input_sequences.shape[1] ##-->89
num_verses=input_sequences.shape[0]
input_sequences=np.array(input_sequences)
target_sequences=np.array(target_sequences)
and the dataset
dataset= tf.data.Dataset.from_tensor_slices((input_sequences, target_sequences))
Nothing seems particularly wrong with this dataset setup. I define the model here
EPOCHS=2
BATCH_SIZE=256
VAL_FRAC=0.2
LSTM_UNITS=1024
DENSE_UNITS=vocab_size
EMBEDDING_DIM=256
BUFFER_SIZE=10000
len_val=int(num_verses*VAL_FRAC)
#build validation dataset
validation_dataset = dataset.take(len_val)
validation_dataset = (
validation_dataset
.shuffle(BUFFER_SIZE)
.padded_batch(BATCH_SIZE, drop_remainder=True)
.prefetch(tf.data.experimental.AUTOTUNE))
#build training dataset
train_dataset = dataset.skip(len_val)
train_dataset = (
train_dataset
.shuffle(BUFFER_SIZE)
.padded_batch(BATCH_SIZE, drop_remainder=True)
.prefetch(tf.data.experimental.AUTOTUNE))
#build the model: 2 stacked LSTM
print('Build model...')
model = tf.keras.Sequential()
model.add(Embedding(vocab_size, EMBEDDING_DIM))
model.add(LSTM(LSTM_UNITS, return_sequences=True, input_shape=(seq_length, vocab_size)))
model.add(Dropout(0.2))
model.add(LSTM(512, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(DENSE_UNITS))
model.add(Activation('softmax'))
loss=tf.losses.SparseCategoricalCrossentropy(from_logits=False)
model.compile(optimizer='adam',
loss=loss,
metrics=[
tf.keras.metrics.SparseCategoricalAccuracy()]
)
model.summary()
I get the following error - it falls in the fit method
ValueError: Shape mismatch: The shape of labels (received (16640,)) should equal the shape of logits except for the last dimension (received (256, 3067)).
Any idea, what can be wrong ?
EDIT
If I change to categorical_crossentropy for the loss I get
/usr/local/lib/python3.6/dist-packages/keras/backend.py:4839 categorical_crossentropy
target.shape.assert_is_compatible_with(output.shape)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_shape.py:1161 assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (256, 65) and (256, 3067) are incompatible
Your preprocessing steps seem fine. Assuming you want to generate a sequence as your output (your targets are sequences), try adjusting your model as follows:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Embedding(vocab_size, EMBEDDING_DIM))
model.add(tf.keras.layers.LSTM(LSTM_UNITS, return_sequences=True))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.LSTM(512, return_sequences=True))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(DENSE_UNITS, activation='softmax')))
Note that your last LSTM
layer now returns the sequences again. Th time-distributed layer simply applies a fully connected layer with a softmax activation function to each time step i
to calculate the probability for each word in the vocabulary. The number of nodes used for each fully connected layer is equal to the size of the vocabulary in order to give each word a fair chance of being predicted.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.