简体   繁体   English

第二阶段后Keras训练失败

[英]Keras Training Fails After 2nd Epoch

EDIT Here is the generator code 编辑这是生成器代码

def generate_batch(self, n_positive=50, negative_ratio=1.0, classification=False):
    # TODO: use `frequency` to reinforce positive labels
    # TODO: allow n_positive to use entire data set
    """
    Generate batches of samples for training

    :param n_positive: number of positive training examples
    :param negative_ratio: ratio of positive:negative training examples
    :param classification: determines type of loss function and network architecture
    :return: generator that products batches of training inputs/labels
    """


    pairs = self.index()
    batch_size = n_positive * (1 + negative_ratio)

    # Adjust label based on task
    if classification:
        neg_label = 0
    else:
        neg_label = -1

    # This creates a generator
    idx = 0 # TODO: make `max_recipe_length` config-driven once `structured_document` in Redshift is hstack'd
    while True:
        # batch = np.zeros((batch_size, 3))
        batch = []
        # randomly choose positive examples
        for idx, (recipe, document) in enumerate(random.sample(pairs, n_positive)):
            encoded = self.encode_pair(recipe, document)
            # TODO: refactor from append
            batch.append([encoded[0], encoded[1], 1])
            # logger.info('([encoded[0], encoded[1], 1]) %s', ([encoded[0], encoded[1], 1]))
            # batch[idx, :] = ([encoded[0], encoded[1], 1])

        # Increment idx by 1
        idx += 1

        # Add negative examples until reach batch size
        while idx < batch_size:
            # TODO: [?] optimize how negative sample inputs are constructed
            random_index_1, random_index_2 = random.randrange(len(self.ingredients_index)), \
                                             random.randrange(len(self.ingredients_index))
            random_recipe, random_document = self.pairs[random_index_1][0], self.pairs[random_index_2][1]

            # Check to make sure this is not a positive example
            if (random_recipe, random_document) not in self.pairs:
                # Add to batch and increment index
                encoded = self.encode_pair(random_recipe, random_document)
                # TODO: refactor from append
                batch.append([encoded[0], encoded[1], neg_label])
                # batch[idx, :] = ([encoded[0], encoded[1], neg_label])
                idx += 1

        # Make sure to shuffle order
        np.random.shuffle(batch)
        batch = np.array(batch)

        ingredients, documents, labels = np.array(batch[:, 0].tolist()), \
                                         np.array(batch[:, 1].tolist()), \
                                         np.array(batch[:, 2].tolist())


        yield {'ingredients': ingredients, 'documents': documents}, labels


batch = t.generate_batch(n_positive, negative_ratio=negative_ratio)
model = model(embedding_size, document_size, vocabulary_size=vocabulary_size)
h = model.fit_generator(
    batch,
    epochs=20,
    steps_per_epoch=int(training_size/(n_positive*(negative_ratio+1))),
    verbose=2
)

I have the following embedding network architecture, which does a great job of learning my corpus on small scales (< 10k training size) but when I increase my training set size, I get shape errors from .fit_generator(...) 我具有以下嵌入式网络体系结构,该体系结构在小规模(<10k训练大小)下学习我的语料库方面做得很好,但是当我增加训练集大小时, .fit_generator(...)会出现形状错误.fit_generator(...)

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
ingredients (InputLayer)        (None, 46)           0
__________________________________________________________________________________________________
documents (InputLayer)          (None, 46)           0
__________________________________________________________________________________________________
ingredients_embedding (Embeddin (None, 46, 10)       100000      ingredients[0][0]
__________________________________________________________________________________________________
documents_embedding (Embedding) (None, 46, 10)       100000      documents[0][0]
__________________________________________________________________________________________________
lambda_1 (Lambda)               (None, 10)           0           ingredients_embedding[0][0]
__________________________________________________________________________________________________
lambda_2 (Lambda)               (None, 10)           0           documents_embedding[0][0]
__________________________________________________________________________________________________
dot_product (Dot)               (None, 1)            0           lambda_1[0][0]
                                                                 lambda_2[0][0]
__________________________________________________________________________________________________
reshape_1 (Reshape)             (None, 1)            0           dot_product[0][0]
==================================================================================================
Total params: 200,000
Trainable params: 200,000
Non-trainable params: 0

Which is generated from the following model code: 这是从以下模型代码生成的:

def model(embedding_size, document_size, vocabulary_size=10000, classification=False):
    ingredients = Input(
        name='ingredients',
        shape=(document_size,)
    )
    documents = Input(
        name='documents',
        shape=(document_size,)
    )

    ingredients_embedding = Embedding(name='ingredients_embedding',
                                      input_dim=vocabulary_size,
                                      output_dim=embedding_size)(ingredients)

    document_embedding = Embedding(name='documents_embedding',
                                   input_dim=vocabulary_size,
                                   output_dim=embedding_size)(documents)

    # sum over the sentence dimension
    ingredients_embedding = Lambda(lambda x: K.sum(x, axis=-2))(ingredients_embedding)
    # sum over the sentence dimension
    document_embedding = Lambda(lambda x: K.sum(x, axis=-2))(document_embedding)

    merged = Dot(name='dot_product', normalize=True, axes=-1)([ingredients_embedding, document_embedding])

    merged = Reshape(target_shape=(1,))(merged)

    # If classification, add extra layer and loss function is binary cross entropy
    if classification:
        merged = Dense(1, activation='sigmoid')(merged)
        m = Model(inputs=[ingredients, documents], outputs=merged)
        m.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])

    # Otherwise loss function is mean squared error
    else:
        m = Model(inputs=[ingredients, documents], outputs=merged)
        m.compile(optimizer='Adam', loss='mse')

    m.summary()

    save_model(m)
    return m

I can train this model on 10k training examples, but when I increase the training set size to 100k records, I get the following error after the 2nd epoch every time. 我可以在1万个训练示例中训练该模型,但是当我将训练集大小增加到10万条记录时,每次在第二个时期之后都会出现以下错误。

Epoch 1/20
 - 8s - loss: 0.3181
Epoch 2/20
 - 6s - loss: 0.1086
Epoch 3/20
Traceback (most recent call last):
  File "run.py", line 38, in <module>
    verbose=2
  File "/usr/local/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/keras/engine/training.py", line 1418, in fit_generator
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python3.7/site-packages/keras/engine/training_generator.py", line 217, in fit_generator
    class_weight=class_weight)
  File "/usr/local/lib/python3.7/site-packages/keras/engine/training.py", line 1211, in train_on_batch
    class_weight=class_weight)
  File "/usr/local/lib/python3.7/site-packages/keras/engine/training.py", line 751, in _standardize_user_data
    exception_prefix='input')
  File "/usr/local/lib/python3.7/site-packages/keras/engine/training_utils.py", line 138, in standardize_input_data
    str(data_shape))
ValueError: Error when checking input: expected documents to have shape (46,) but got array with shape (1,)

Apparently, after some number of iterations the input data has wrong shape. 显然,经过多次迭代后,输入数据的形状错误。 I suspect it happens here: 我怀疑它发生在这里:

 encoded = self.encode_pair(recipe, document)

What is the code of encode_pair ? encode_pair的代码是encode_pair Is it guaranteed that encoded[0] is always of size 46? 是否可以保证encoded[0]的大小始终为46?

The issue was an edge in the data my generator was yielding. 问题出在我的生成器产生的数据中。 1 single record had a length of 43 as opposed to 46 , and that threw off the entire training. 1个单项记录的长度为43 ,而不是46 ,这使整个培训中断了。 Im still confused by the ValueError message, tho. 我仍然对ValueError消息感到困惑。 It reads but got array with shape (1,) when it reality it should read but got array with shape (43,) 它读取but got array with shape (1,)时,它实际上它应读but got array with shape (43,)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 Keras 的每个训练周期后进行预测? - How to predict after each epoch of training in Keras? model 训练在第一个 epoch 完成后卡住了……第二个 epoch 甚至不会开始,也不会抛出任何错误,它只是保持空闲 - model training got stuck after first epoch got completed…2nd epoch won't even start and not throwing any error too it justs stay idle 在多次正确执行后,Keras 训练在中期崩溃 - Keras training crashes mid epoch after multiple correct executions 如何在 Keras 的每个时期保存训练历史? - How to save training history on every epoch in Keras? Keras 模型训练挂在第一个 epoch - Keras model training hanging on first epoch 验证和训练准确性很高,在第一个时期[Keras] - Validation and training accuracy high in the first epoch [Keras] 额外的纪元继续训练后Keras正确保存检查点-初始纪元 - Keras correctly saving checkpoints after extra epochs continuing training - initial epoch 在运行训练 model 以使用 keras 进行面部表情检测时出现错误 - Getting Error after Epoch While running a training model for facial expression detection using keras Keras:model.evaluate() 在训练和 val 集上与上次训练时期后的 acc 和 val_acc 不同 - Keras: model.evaluate() on training and val set differ from the acc and val_acc after last training epoch Keras 在 1 个完成的 epoch 后停止 - Keras stops after 1 completed epoch
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM