简体   繁体   English

Keras:将mask_zero与带填充序列相比使用单序列非带填充训练

[英]Keras: using mask_zero with padded sequences versus single sequence non padded training

I'm building an LSTM model in Keras to classify entities from sentences. 我正在Keras中构建LSTM模型,以根据句子对实体进行分类。 I'm experimenting with both zero padded sequences and the mask_zero parameter, or a generator to train the model on one sentence (or batches of same length sentences) at a time so I don't need to pad them with zeros. 我正在尝试填充零的序列和mask_zero参数,或者尝试使用生成器一次在一个句子(或成批相同长度的句子)上训练模型,因此不需要在它们上填充零。

If I define my model as such: 如果我这样定义我的模型:

model = Sequential()
model.add(Embedding(input_dim=vocab_size+1, output_dim=200, mask_zero=True,
                    weights=[pretrained_weights], trainable = True))
model.add(Bidirectional(LSTM(units=100, return_sequences=True, recurrent_dropout=0.1)))
model.add(Dropout(0.2))
model.add(Bidirectional(LSTM(units=100, return_sequences=True, recurrent_dropout=0.1)))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(target_size, activation='softmax')))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics = ['accuracy'])

Can I expect the padded sequences with the mask_zero parameter to perform similarly to feeding the model non-padded sequences one sentence at a time? 我是否可以期望带有mask_zero参数的填充序列的性能类似于一次填充模型非填充序列的一句话? Essentially: 实质上:

model.fit(padded_x, padded_y, batch_size=128, epochs=n_epochs,
 validation_split=0.1, verbose=1)

or 要么

def iter_sentences():
        while True:
            for i in range(len(train_x)):
                yield np.array([train_x[i]]), to_categorical([train_y[i]], num_classes = target_size)

model.fit_generator(iter_sentences(), steps_per_epoch=less_steps, epochs=way_more_epochs, verbose=1)

I'm just not sure if there is a general preference for one method over the other, or the exact effect the mask_zero parameter has on the model. 我只是不确定是否有一种方法优于另一种方法的一般偏好,或者mask_zero参数对模型的确切影响。

Note: There are slight parameter differences for the model initialization based on which training method I'm using - I've left those out for brevity. 注意:根据我使用的训练方法,模型初始化会有一些细微的参数差异-为了简洁起见,我省略了这些参数。

The biggest difference will be performance and training stability , otherwise padding and then masking is the same as processing single sentence at time. 最大的区别在于性能训练稳定性 ,否则填充和掩蔽与一次处理单个句子相同。

  1. performance : Well you will train one point at a time which might not exploit any parallelism that is available on the hardware. 性能 :好吧,您一次只能训练一个点,这可能不会利用硬件上可用的任何并行性。 Often, we adjust the batch size to get the best performance from the machine during training and prediction. 通常,我们会调整批次大小,以便在训练和预测期间从机器获得最佳性能。
  2. training stability : when you set batch size to 1 you are not longer performing mini-batch training. 训练稳定性 :将批次大小设置为1时,您将不再执行小批量训练。 The training routine will apply updates after every data point which might be detrimental for momentum based algorithms such as Adam. 训练例程将在每个数据点之后应用更新,这可能对基于动量的算法(例如Adam)不利。 Instead, accumulating gradients over a batch tends to provide more stable convergence especially if the data is noisy. 取而代之的是,累积批次上的梯度会提供更稳定的收敛性,尤其是在数据嘈杂的情况下。

So to answer the question, no, you can't expect them to perform similarly. 因此,要回答这个问题,不,您不能期望它们具有类似的性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 尽管对 keras 中的零填充小批量 LSTM 训练进行掩码支持,但零预测 - Zero predictions despite masking support for zero-padded mini batch LSTM training in keras 在Python中解析非零填充时间戳 - Parsing non-zero padded timestamps in Python keras - 嵌入层 mask_zero 导致后续层出现异常 - keras - embedding layer mask_zero causing exception at subsequent layers keras图层Masking()和Embedding(mask_zero = True)之间有区别吗? - Is there is difference between the keras layers Masking() and Embedding(mask_zero =True)? Keras 嵌入层中的 mask_zero 如何工作? - How does mask_zero in Keras Embedding layer work? 使用 pack_padded_sequence - pad_packed_sequence 时训练精度下降和损失增加 - Training accuracy decrease and loss increase when using pack_padded_sequence - pad_packed_sequence 如何在 pytorch 中屏蔽填充的 0 以进行 RNN 模型训练? - How to mask padded 0s in pytorch for RNN model training? 如何使用熊猫生成零填充数字序列到给定的限制? - How to generate a zero padded sequence of numbers upto a given limit using pandas? 有没有办法将非零填充时间字符串转换为日期时间? - Is there a way to convert a non zero padded time string into a datetime? 如何连接2在Keras2.0中使用'mask_zero = True'嵌入图层? - How to Concatenate 2 Embedding layers with 'mask_zero=True' in Keras2.0?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM