简体   繁体   English

Tensorflow 2.0 Keras 的训练速度比 2.0 Estimator 慢 4 倍

[英]Tensorflow 2.0 Keras is training 4x slower than 2.0 Estimator

We recently switched to Keras for TF 2.0, but when we compared it to the DNNClassifier Estimator on 2.0, we experienced around 4x slower speeds with Keras.我们最近在 TF 2.0 中切换到 Keras,但是当我们将其与 2.0 上的 DNNClassifier Estimator 进行比较时,我们发现使用 Keras 的速度降低了大约 4 倍。 But I cannot for the life of me figure out why this is happening.但是我终其一生都无法弄清楚为什么会发生这种情况。 The rest of the code for both are identical, using an input_fn() that returns the same tf.data.Dataset, and using identical feature_columns.两者的其余代码是相同的,使用返回相同 tf.data.Dataset 的 input_fn(),并使用相同的 feature_columns。 Been struggling with this problem for days now.几天来一直在努力解决这个问题。 Any help would be greatly greatly appreciated.任何帮助将不胜感激。 Thank you谢谢

Estimator code:估算器代码:

estimator = tf.estimator.DNNClassifier(
        feature_columns = feature_columns,
        hidden_units = [64,64],
        activation_fn = tf.nn.relu,
        optimizer = 'Adagrad',
        dropout = 0.4,
        n_classes = len(vocab),
        model_dir = model_dir,
        batch_norm = false)

estimator.train(input_fn=train_input_fn, steps=400)

Keras code:凯拉斯代码:

feature_layer = tf.keras.layers.DenseFeatures(feature_columns);

model = tf.keras.Sequential([
        feature_layer,
        layers.Dense(64, input_shape = (len(vocab),), activation = tf.nn.relu),
        layers.Dropout(0.4),
        layers.Dense(64, activation = tf.nn.relu),
        layers.Dropout(0.4),
        layers.Dense(len(vocab), activation = 'softmax')]);

model.compile(
        loss = 'sparse_categorical_crossentropy',
        optimizer = 'Adagrad'
        distribute = None)

model.fit(x = train_input_fn(),
          epochs = 1,
          steps_per_epoch = 400,
          shuffle = True)

UPDATE: To test further, I wrote a custom subclassed Model (See: Get Started For Experts ), which runs faster than Keras but slower than Estimators.更新:为了进一步测试,我编写了一个自定义子类模型(请参阅: 专家入门),它的运行速度比 Keras 快,但比 Estimators 慢。 If Estimator trains in 100 secs, the custom model takes approx ~180secs, and Keras approx ~350secs.如果 Estimator 在 100 秒内训练,自定义模型需要大约 180 秒,Keras 大约需要大约 350 秒。 An interesting note is that Estimator runs slower with Adam() than Adagrad() while Keras seems to run faster.一个有趣的提示是,Estimator 使用 Adam() 运行得比 Adagrad() 慢,而 Keras 似乎运行得更快。 With Adam() Keras takes less than twice as long as DNNClassifier.使用 Adam() Keras 花费的时间不到 DNNClassifier 的两倍。 Assuming I didn't mess up the custom code, I'm beginning to think that DNNClassifier just has a lot of backend optimization / efficiencies that make it run faster than Keras.假设我没有弄乱自定义代码,我开始认为 DNNClassifier 只是有很多后端优化/效率,使其比 Keras 运行得更快。

Custom code:自定义代码:

class MyModel(Model):
  def __init__(self):
    super(MyModel, self).__init__()
    self.features = layers.DenseFeatures(feature_columns, trainable=False)
    self.dense = layers.Dense(64, activation = 'relu')
    self.dropout = layers.Dropout(0.4)
    self.dense2 = layers.Dense(64, activation = 'relu')
    self.dropout2 = layers.Dropout(0.4)
    self.softmax = layers.Dense(len(vocab_of_codes), activation = 'softmax')

  def call(self, x):
    x = self.features(x)
    x = self.dense(x)
    x = self.dropout(x)
    x = self.dense2(x)
    x = self.dropout2(x)
    return self.softmax(x)

model = MyModel()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adagrad()

@tf.function
def train_step(features, label):
  with tf.GradientTape() as tape:
    predictions = model(features)
    loss = loss_object(label, predictions)
  gradients = tape.gradient(loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))

itera = iter(train_input_fn())
for i in range(400):
  features, labels = next(itera)
  train_step(features, labels)

UPDATE: It possibly seems to be the dataset.更新:它可能似乎是数据集。 When I print a row of the dataset within the train_input_fn(), in estimators, it prints out the non-eager Tensor definition.当我在 train_input_fn() 中打印一行数据集时,在估计器中,它会打印出非热切的 Tensor 定义。 In Keras, it prints out the eager values.在 Keras 中,它会打印出 Eager 值。 Going through the Keras backend code, when it receives a tf.data.dataset as input, it handles it eagerly (and ONLY eagerly), which is why it was crashing whenever I used tf.function on the train_input_fn().通过 Keras 后端代码,当它接收 tf.data.dataset 作为输入时,它会急切地(并且只是急切地)处理它,这就是为什么每当我在 train_input_fn() 上使用 tf.function 时它都会崩溃。 Basically, my guess is DNNClassifier is training faster than Keras because it runs more dataset code in graph mode.基本上,我的猜测是 DNNClassifier 的训练速度比 Keras 快,因为它在图形模式下运行更多的数据集代码。 Will post any updates/finds.将发布任何更新/发现。

I believe it is slower because it is not being executed on the graph.我相信它比较慢,因为它没有在图表上执行。 In order to execute on the graph in TF2 you'll need a function decorated with the tf.function decorator.为了在 TF2 中的图形上执行,您需要一个用 tf.function 装饰器装饰的函数。 Check out this section for ideas on how to restructure your code.查看此部分以了解有关如何重构代码的想法。

For those who (like me) find this question and use Keras's Embedding layers:对于那些(像我一样)发现这个问题并使用 Keras 嵌入层的人:

Even if a GPU is present, but eager execution is enabled, Embedding layers are always placed on the CPU, causing a massive slow-down.即使存在 GPU,但启用了 Eager Execution,Embedding 层也始终放置在 CPU 上,从而导致大量减速。

See https://github.com/tensorflow/tensorflow/issues/44194 , which also contains a workaround.请参阅https://github.com/tensorflow/tensorflow/issues/44194 ,其中还包含一个解决方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM