[英]Keras Training Fails After 2nd Epoch
EDIT Here is the generator code 编辑这是生成器代码
def generate_batch(self, n_positive=50, negative_ratio=1.0, classification=False):
# TODO: use `frequency` to reinforce positive labels
# TODO: allow n_positive to use entire data set
"""
Generate batches of samples for training
:param n_positive: number of positive training examples
:param negative_ratio: ratio of positive:negative training examples
:param classification: determines type of loss function and network architecture
:return: generator that products batches of training inputs/labels
"""
pairs = self.index()
batch_size = n_positive * (1 + negative_ratio)
# Adjust label based on task
if classification:
neg_label = 0
else:
neg_label = -1
# This creates a generator
idx = 0 # TODO: make `max_recipe_length` config-driven once `structured_document` in Redshift is hstack'd
while True:
# batch = np.zeros((batch_size, 3))
batch = []
# randomly choose positive examples
for idx, (recipe, document) in enumerate(random.sample(pairs, n_positive)):
encoded = self.encode_pair(recipe, document)
# TODO: refactor from append
batch.append([encoded[0], encoded[1], 1])
# logger.info('([encoded[0], encoded[1], 1]) %s', ([encoded[0], encoded[1], 1]))
# batch[idx, :] = ([encoded[0], encoded[1], 1])
# Increment idx by 1
idx += 1
# Add negative examples until reach batch size
while idx < batch_size:
# TODO: [?] optimize how negative sample inputs are constructed
random_index_1, random_index_2 = random.randrange(len(self.ingredients_index)), \
random.randrange(len(self.ingredients_index))
random_recipe, random_document = self.pairs[random_index_1][0], self.pairs[random_index_2][1]
# Check to make sure this is not a positive example
if (random_recipe, random_document) not in self.pairs:
# Add to batch and increment index
encoded = self.encode_pair(random_recipe, random_document)
# TODO: refactor from append
batch.append([encoded[0], encoded[1], neg_label])
# batch[idx, :] = ([encoded[0], encoded[1], neg_label])
idx += 1
# Make sure to shuffle order
np.random.shuffle(batch)
batch = np.array(batch)
ingredients, documents, labels = np.array(batch[:, 0].tolist()), \
np.array(batch[:, 1].tolist()), \
np.array(batch[:, 2].tolist())
yield {'ingredients': ingredients, 'documents': documents}, labels
batch = t.generate_batch(n_positive, negative_ratio=negative_ratio)
model = model(embedding_size, document_size, vocabulary_size=vocabulary_size)
h = model.fit_generator(
batch,
epochs=20,
steps_per_epoch=int(training_size/(n_positive*(negative_ratio+1))),
verbose=2
)
I have the following embedding network architecture, which does a great job of learning my corpus on small scales (< 10k training size) but when I increase my training set size, I get shape errors from .fit_generator(...)
我具有以下嵌入式网络体系结构,该体系结构在小规模(<10k训练大小)下学习我的语料库方面做得很好,但是当我增加训练集大小时,
.fit_generator(...)
会出现形状错误.fit_generator(...)
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
ingredients (InputLayer) (None, 46) 0
__________________________________________________________________________________________________
documents (InputLayer) (None, 46) 0
__________________________________________________________________________________________________
ingredients_embedding (Embeddin (None, 46, 10) 100000 ingredients[0][0]
__________________________________________________________________________________________________
documents_embedding (Embedding) (None, 46, 10) 100000 documents[0][0]
__________________________________________________________________________________________________
lambda_1 (Lambda) (None, 10) 0 ingredients_embedding[0][0]
__________________________________________________________________________________________________
lambda_2 (Lambda) (None, 10) 0 documents_embedding[0][0]
__________________________________________________________________________________________________
dot_product (Dot) (None, 1) 0 lambda_1[0][0]
lambda_2[0][0]
__________________________________________________________________________________________________
reshape_1 (Reshape) (None, 1) 0 dot_product[0][0]
==================================================================================================
Total params: 200,000
Trainable params: 200,000
Non-trainable params: 0
Which is generated from the following model code: 这是从以下模型代码生成的:
def model(embedding_size, document_size, vocabulary_size=10000, classification=False):
ingredients = Input(
name='ingredients',
shape=(document_size,)
)
documents = Input(
name='documents',
shape=(document_size,)
)
ingredients_embedding = Embedding(name='ingredients_embedding',
input_dim=vocabulary_size,
output_dim=embedding_size)(ingredients)
document_embedding = Embedding(name='documents_embedding',
input_dim=vocabulary_size,
output_dim=embedding_size)(documents)
# sum over the sentence dimension
ingredients_embedding = Lambda(lambda x: K.sum(x, axis=-2))(ingredients_embedding)
# sum over the sentence dimension
document_embedding = Lambda(lambda x: K.sum(x, axis=-2))(document_embedding)
merged = Dot(name='dot_product', normalize=True, axes=-1)([ingredients_embedding, document_embedding])
merged = Reshape(target_shape=(1,))(merged)
# If classification, add extra layer and loss function is binary cross entropy
if classification:
merged = Dense(1, activation='sigmoid')(merged)
m = Model(inputs=[ingredients, documents], outputs=merged)
m.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])
# Otherwise loss function is mean squared error
else:
m = Model(inputs=[ingredients, documents], outputs=merged)
m.compile(optimizer='Adam', loss='mse')
m.summary()
save_model(m)
return m
I can train this model on 10k training examples, but when I increase the training set size to 100k records, I get the following error after the 2nd epoch every time. 我可以在1万个训练示例中训练该模型,但是当我将训练集大小增加到10万条记录时,每次在第二个时期之后都会出现以下错误。
Epoch 1/20
- 8s - loss: 0.3181
Epoch 2/20
- 6s - loss: 0.1086
Epoch 3/20
Traceback (most recent call last):
File "run.py", line 38, in <module>
verbose=2
File "/usr/local/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/keras/engine/training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python3.7/site-packages/keras/engine/training_generator.py", line 217, in fit_generator
class_weight=class_weight)
File "/usr/local/lib/python3.7/site-packages/keras/engine/training.py", line 1211, in train_on_batch
class_weight=class_weight)
File "/usr/local/lib/python3.7/site-packages/keras/engine/training.py", line 751, in _standardize_user_data
exception_prefix='input')
File "/usr/local/lib/python3.7/site-packages/keras/engine/training_utils.py", line 138, in standardize_input_data
str(data_shape))
ValueError: Error when checking input: expected documents to have shape (46,) but got array with shape (1,)
Apparently, after some number of iterations the input data has wrong shape. 显然,经过多次迭代后,输入数据的形状错误。 I suspect it happens here:
我怀疑它发生在这里:
encoded = self.encode_pair(recipe, document)
What is the code of encode_pair
? encode_pair
的代码是encode_pair
? Is it guaranteed that encoded[0]
is always of size 46? 是否可以保证
encoded[0]
的大小始终为46?
The issue was an edge in the data my generator was yielding. 问题出在我的生成器产生的数据中。 1 single record had a length of
43
as opposed to 46
, and that threw off the entire training. 1个单项记录的长度为
43
,而不是46
,这使整个培训中断了。 Im still confused by the ValueError
message, tho. 我仍然对
ValueError
消息感到困惑。 It reads but got array with shape (1,)
when it reality it should read but got array with shape (43,)
它读取
but got array with shape (1,)
时,它实际上它应读but got array with shape (43,)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.