[英]Keras: using mask_zero with padded sequences versus single sequence non padded training
I'm building an LSTM model in Keras to classify entities from sentences. 我正在Keras中构建LSTM模型,以根据句子对实体进行分类。 I'm experimenting with both zero padded sequences and the mask_zero parameter, or a generator to train the model on one sentence (or batches of same length sentences) at a time so I don't need to pad them with zeros. 我正在尝试填充零的序列和mask_zero参数,或者尝试使用生成器一次在一个句子(或成批相同长度的句子)上训练模型,因此不需要在它们上填充零。
If I define my model as such: 如果我这样定义我的模型:
model = Sequential()
model.add(Embedding(input_dim=vocab_size+1, output_dim=200, mask_zero=True,
weights=[pretrained_weights], trainable = True))
model.add(Bidirectional(LSTM(units=100, return_sequences=True, recurrent_dropout=0.1)))
model.add(Dropout(0.2))
model.add(Bidirectional(LSTM(units=100, return_sequences=True, recurrent_dropout=0.1)))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(target_size, activation='softmax')))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics = ['accuracy'])
Can I expect the padded sequences with the mask_zero parameter to perform similarly to feeding the model non-padded sequences one sentence at a time? 我是否可以期望带有mask_zero参数的填充序列的性能类似于一次填充模型非填充序列的一句话? Essentially: 实质上:
model.fit(padded_x, padded_y, batch_size=128, epochs=n_epochs,
validation_split=0.1, verbose=1)
or 要么
def iter_sentences():
while True:
for i in range(len(train_x)):
yield np.array([train_x[i]]), to_categorical([train_y[i]], num_classes = target_size)
model.fit_generator(iter_sentences(), steps_per_epoch=less_steps, epochs=way_more_epochs, verbose=1)
I'm just not sure if there is a general preference for one method over the other, or the exact effect the mask_zero parameter has on the model. 我只是不确定是否有一种方法优于另一种方法的一般偏好,或者mask_zero参数对模型的确切影响。
Note: There are slight parameter differences for the model initialization based on which training method I'm using - I've left those out for brevity. 注意:根据我使用的训练方法,模型初始化会有一些细微的参数差异-为了简洁起见,我省略了这些参数。
The biggest difference will be performance and training stability , otherwise padding and then masking is the same as processing single sentence at time. 最大的区别在于性能和训练稳定性 ,否则填充和掩蔽与一次处理单个句子相同。
So to answer the question, no, you can't expect them to perform similarly. 因此,要回答这个问题,不,您不能期望它们具有类似的性能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.