简体   繁体   中英

Tensorflow fit ValueError: Shape mismatch: The shape of labels (received (16640,)) should equal the shape of logits except for the last dimension

I create a tf dataset from a tokenized text converted to sequences then numpy arrays

tokenizer = Tokenizer()
tokenizer.fit_on_texts(bible_text)#Builds the word index
sequences = tokenizer.texts_to_sequences(bible_text)

##-->[[5, 1, 914, 32, 1352, 1, 214, 2, 1, 111],
## [2, 1, 111, 31, 252, 2091, 2, 1874, 2, 547, 31, 38, 1, 196, 3, 1, 899, 2, 1, 298, 3, 32, 878, 38, 1, 196, 3, 1, 266],
## [2, 32, 33, 79, 54, 16, 369, 2, 54, 31, 369], [2, 32, 215, 1, 369, 6, 17, 31, 156, 2, 32, 955, 1, 369, 34, 1, 547], ...]

sequences=pad_sequences(sequences, padding='post')

##-->[[   5    1  914   32 1352    1  214    2    1  111    0    0    0    0
##     0    0    0    0    0    0    0    0    0    0    0    0    0    0
##     0    0    0    0    0    0    0    0    0    0    0    0    0    0
##     0    0    0    0    0    0    0    0    0    0    0    0    0    0
##     0    0    0    0    0    0    0    0    0    0    0    0    0    0
##     0    0    0    0    0    0    0    0    0    0    0    0    0    0
##     0    0    0    0    0    0]
##...]

word_index=tokenizer.word_index 

##for k,v in sorted(word_index.items(), key=operator.itemgetter(1))[:10]:
##   print (k,v)

##--> the 1
##and 2
##of 3
##to 4
##in 5
##that 6
##shall 7
##he 8
##lord 9
##his 10
##
##[...]

vocab_size = len(tokenizer.word_index) + 1

building input and target sequences

input_sequences, target_sequences = sequences[:,:-1], sequences[:,1:]
seq_length=input_sequences.shape[1] ##-->89
num_verses=input_sequences.shape[0]

input_sequences=np.array(input_sequences)
target_sequences=np.array(target_sequences)

and the dataset

dataset= tf.data.Dataset.from_tensor_slices((input_sequences, target_sequences))

Nothing seems particularly wrong with this dataset setup. I define the model here

EPOCHS=2
BATCH_SIZE=256
VAL_FRAC=0.2  
LSTM_UNITS=1024
DENSE_UNITS=vocab_size
EMBEDDING_DIM=256
BUFFER_SIZE=10000

len_val=int(num_verses*VAL_FRAC)

#build validation dataset
validation_dataset = dataset.take(len_val)
validation_dataset = (
    validation_dataset
    .shuffle(BUFFER_SIZE)
    .padded_batch(BATCH_SIZE, drop_remainder=True)
    .prefetch(tf.data.experimental.AUTOTUNE))

#build training dataset
train_dataset = dataset.skip(len_val)
train_dataset = (
    train_dataset
    .shuffle(BUFFER_SIZE)
    .padded_batch(BATCH_SIZE, drop_remainder=True)
    .prefetch(tf.data.experimental.AUTOTUNE))

#build the model: 2 stacked LSTM
print('Build model...')
model = tf.keras.Sequential()
model.add(Embedding(vocab_size, EMBEDDING_DIM))
model.add(LSTM(LSTM_UNITS, return_sequences=True, input_shape=(seq_length, vocab_size)))
model.add(Dropout(0.2))
model.add(LSTM(512, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(DENSE_UNITS))
model.add(Activation('softmax'))

loss=tf.losses.SparseCategoricalCrossentropy(from_logits=False)

model.compile(optimizer='adam',
              loss=loss,
              metrics=[
                  tf.keras.metrics.SparseCategoricalAccuracy()]
              )

model.summary()

I get the following error - it falls in the fit method

ValueError: Shape mismatch: The shape of labels (received (16640,)) should equal the shape of logits except for the last dimension (received (256, 3067)).

Any idea, what can be wrong ?

EDIT

If I change to categorical_crossentropy for the loss I get

   /usr/local/lib/python3.6/dist-packages/keras/backend.py:4839 categorical_crossentropy
        target.shape.assert_is_compatible_with(output.shape)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_shape.py:1161 assert_is_compatible_with
        raise ValueError("Shapes %s and %s are incompatible" % (self, other))

    ValueError: Shapes (256, 65) and (256, 3067) are incompatible

Your preprocessing steps seem fine. Assuming you want to generate a sequence as your output (your targets are sequences), try adjusting your model as follows:

model = tf.keras.Sequential()
model.add(tf.keras.layers.Embedding(vocab_size, EMBEDDING_DIM))
model.add(tf.keras.layers.LSTM(LSTM_UNITS, return_sequences=True))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.LSTM(512, return_sequences=True))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(DENSE_UNITS, activation='softmax')))

Note that your last LSTM layer now returns the sequences again. Th time-distributed layer simply applies a fully connected layer with a softmax activation function to each time step i to calculate the probability for each word in the vocabulary. The number of nodes used for each fully connected layer is equal to the size of the vocabulary in order to give each word a fair chance of being predicted.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM