繁体   English   中英

错误嵌入输入形状:预期 embedding_1_input 具有形状 (25,) 但得到形状为 (1,) 的数组

[英]error embedding input shape: expected embedding_1_input to have shape (25,) but got array with shape (1,)

我不确定为什么我不断收到此错误。 我已经检查了我的实际标记化 + 编码文本数据的长度,它与我选择的输入长度相匹配。 代码如下:

from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

import numpy as np

training_samples = 6603 
max_words = 10000  # We will only consider the top 10,000 words in the dataset

tokenizer = Tokenizer(num_words=max_words)
tokenizer.fit_on_texts(X_train) 
sequences = tokenizer.texts_to_sequences(X_train) 
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))
print('Shape of data tensor:', X_train.shape)

max_length = 25
padded_s = pad_sequences(sequences, maxlen=max_length, padding='post')
print(padded_s)

print(padded_s.shape)

y_train = np.array(y_train)
y_test = np.array(y_test)

由此 - 输出是:

Found 10759 unique tokens.
Shape of data tensor: (5942,)
[[  17  119  154 ...    0    0    0]
 [  31  116   40 ...    0    0    0]
 [1925 1711   15 ...  184    0    0]
 ...
 [   6 1915  375 ...    0    0    0]
 [ 693  190   24 ...    0    0    0]
 [   1  570    2 ...    0    0    0]]
**(5942, 25)**

从上面可以看出,它是 25 而不是 1!

 glove_dir = '/Users/xxx/Downloads/glove.6B'
embeddings_index = {}
f = open(os.path.join(glove_dir, 'glove.6B.100d.txt'))
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs
f.close()
print('Found %s word vectors.' % len(embeddings_index))

embedding_dim = 100

embedding_matrix = np.zeros((max_words, embedding_dim))

for word, i in word_index.items():
    embedding_vector = embeddings_index.get(word)
    if i < max_words:
        if embedding_vector is not None:
            # Words not found in embedding index will be all-zeros.
            embedding_matrix[i] = embedding_vector

from keras.models import Sequential
from keras.layers import Embedding, Flatten, Dense
model = Sequential()
model.add(Embedding(max_words, embedding_dim, weights=[embedding_matrix], input_length=max_length, trainable=False))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.summary()

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['acc'])
history = model.fit(padded_s, y_train,
                    epochs=10,
                    batch_size=32,
                    validation_data=(X_val, y_val))
model.save_weights('pre_trained_glove_model.h5')

这将返回错误:

ValueError: Error when checking input: expected embedding_1_input to have shape (25,) but got array with shape (1,)

任何帮助将不胜感激 - 非常感谢!

修正:我忘记将我的验证集转换为序列 + 填充序列。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM