tf.keras的tf.data.Dataset性能

Question

I am playing with multilabel text classification using tf.keras (tensorflow version 1.9.0). 我正在使用tf.keras（tensorflow版本1.9.0）进行多标签文本分类。 I have a dataset consisting of 185485 train and 46372 validation examples. 我有一个由185485火车和46372验证示例组成的数据集。

For the first try (on CPU), I pre-padded the data and fed it to the model: 对于第一次尝试（在CPU上），我预先填充了数据并将其提供给模型：

from tensorflow import keras
....
X_train = keras.preprocessing.sequence.pad_sequences(X_train, maxlen=2000)

inp = keras.Input(shape=(X_train.shape[1], ))
x = keras.layers.Embedding(len(tk.word_index) + 1, 256, mask_zero=True)(inp)
x = keras.layers.LSTM(128, return_sequences=False)(x)
x = keras.layers.Dropout(0.1)(x)
x = keras.layers.Dense(y_train.shape[1], activation="sigmoid")(x)
model = keras.Model(inputs=inp, outputs=x)
model.compile(loss='binary_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=128, epochs=300, validation_split=0.2)

The model takes ~130 minutes per epoch to train. 该模型每个时期需要约130分钟的时间进行训练。

I then try to do the same with Dataset. 然后，我尝试对数据集执行相同的操作。 Since my data is large, I can't fit in 2GB limit to use Dataset.from_tensor_slices() , and I use Dataset.from_generator() instead 由于我的数据很大，因此我无法容纳2GB的限制来使用Dataset.from_tensor_slices() ，而是改用Dataset.from_generator()

X_train = keras.preprocessing.sequence.pad_sequences(X_train, maxlen=2000)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=777, stratify=y_train)

def gen(data, labels):
    for x, y in zip(data, labels):
        yield x, y

train_dataset = tf.data.Dataset.from_generator(
    lambda: gen(X_train, y_train),
    output_types=(tf.int32, tf.int32),
    output_shapes=([2000], [y_train.shape[1]]),
)

train_dataset = train_dataset.batch(128)
train_dataset = train_dataset.repeat()

val_dataset = ...

....
model.fit(train_dataset, epochs=300, steps_per_epoch=len(X_train)//128, validation_data=val_dataset,
      validation_steps=len(X_val)//128)

I expected the performance to be roughly the same, which is not the case, since one epoch took ~280 minutes to train. 我期望性能大致相同，但事实并非如此，因为一个纪元需要约280分钟的训练时间。 What am I missing? 我想念什么？ How do I achieve the same performance using the Dataset input? 如何使用数据集输入获得相同的性能？

Answer 1

Try unroll=True in LSTM layer. 尝试在LSTM层中unroll=True 。 For detail check LSTM layer documentation on Keras website 有关详细信息，请查看Keras网站上的LSTM层文档
Or use CuDNNLSTM on GPU for further speedup 或在GPU上使用CuDNNLSTM进一步提高速度

tf.keras的tf.data.Dataset性能

问题描述

1 个解决方案

解决方案1
-2 2018-07-14 17:44:10

tf.keras的tf.data.Dataset性能

问题描述

1 个解决方案

解决方案1 -2 2018-07-14 17:44:10

解决方案1
-2 2018-07-14 17:44:10