在 GPU 上加速 TF/Keras LSTM 文本生成？

Question

The tensorflow official example for text generation ( https://github.com/tensorflow/docs/blob/master/site/en/tutorials/text/text_generation.ipynb ) runs in a loop as defined below.用于文本生成的 tensorflow 官方示例（ https://github.com/tensorflow/docs/blob/master/site/en/tutorials/text/text_generation.ipynb ）在如下定义的循环中运行。 The text generation feels slow, and according to NVTOP only uses a fraction of the available GPU resources (15-20%).文本生成感觉很慢，并且根据 NVTOP 仅使用可用 GPU 资源的一小部分 (15-20%)。

Any suggestions on how to speed up text generation?关于如何加快文本生成的任何建议？ A quick look at cprofiler shows that 90% of the time is spent on the single line predictions = model(input_eval) , so I don't think there are a lot of gains to be had elsewhere.快速浏览一下 cprofiler 会发现 90% 的时间都花在了单行predictions = model(input_eval)上，所以我认为其他地方不会有很多收获。

Also, the Tensorflow/Keras documentation https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict recommends calling the function just as is done below...此外，Tensorflow/Keras 文档https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict建议调用 ZC1C425268E68385D1AB5074 如下所示完成C1A ...

this method is designed for performance in large scale inputs.此方法专为大规模输入的性能而设计。 For small amount of inputs that fit in one batch, directly using call is recommended for faster execution, eg, model(x), or model(x, training=False)对于一批适合的少量输入，建议直接使用call以加快执行速度，例如 model(x) 或 model(x, training=False)

Any suggestions on how to speed up text generation?关于如何加快文本生成的任何建议？ Would it be possible to better use the GPU by generating multiple lines at the same time?是否可以通过同时生成多条线路来更好地使用 GPU？

def generate_text(model, start_string):
  # Evaluation step (generating text using the learned model)

  # Number of characters to generate
  num_generate = 1000

  # Converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty string to store our results
  text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  # Experiment to find the best setting.
  temperature = 1.0

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the character returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted character as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

Answer 1

To speed up the processing, I have two suggestions,为了加快处理速度，我有两个建议，

As you have GPU support, you may want to set unroll=True of the GRU layer.由于您有 GPU 支持，您可能需要设置GRU层的unroll=True 。 As per the Keras GRU documentation , setting unroll=True reduces some computation by using some extra memory.根据 Keras GRU文档，设置unroll=True通过使用一些额外的 memory 减少了一些计算。 As your GPU consumption is quite less, you may want to use unroll=True .由于您的 GPU 消耗量非常少，您可能需要使用unroll=True 。 Using this setting, you may notice up to 2x speed boost (depending on the circumstances).使用此设置，您可能会注意到高达2x倍的速度提升（取决于具体情况）。 However, you should avoid using unroll if the input sequence is too long.但是，如果输入序列太长，您应该避免使用展开。
I noticed that the text-generation architecture you linked uses GRU layer before a Dense layer.我注意到您链接的文本生成架构在Dense层之前使用GRU层。 The GRU is given a parameter return_sequences=True . GRU有一个参数return_sequences=True 。 This causes the GRU layer to pass unnecessary output values to the following Dense layers and requires more computation.这会导致GRU层将不必要的 output 值传递给以下Dense层，并且需要更多计算。 Generally, return_sequences=True should be only set if the following layer of the model is also an RNN layer.一般只有在model的下一层也是RNN层时才需要设置return_sequences=True 。 Therefore, try setting the parameter return_sequences=False .因此，请尝试设置参数return_sequences=False 。 This may also improve performance.这也可以提高性能。

Finally, the model(x, training=False) really works.最后， model(x, training=False)确实有效。 I believe by maintaining these three issues, you may notice a significant performance improvement.我相信通过维护这三个问题，您可能会注意到性能的显着提升。

Answer 2

Not sure you can speed generation.不确定您是否可以加快生成速度。 You are doing num_generate forward calls on your model for an input with batch size of 1. While during training, you can operate on the whole sequence and compute a loss over it, during predicting each new character depends on the previously generated ones and the generation function doesn't run in parallel.您正在对您的 model 进行num_generate前向调用，以获取批量大小为 1 的输入。在训练期间，您可以对整个序列进行操作并计算其损失，在预测每个新字符期间取决于先前生成的字符和生成function 不并行运行。

If you want to see higher GPU utilization, you could call predict on a batch of inputs seeded with different starting characters - this relates to your question about 'generating multiple lines at the same time'.如果您想看到更高的 GPU 利用率，您可以在一批以不同起始字符为种子的输入上调用 predict - 这与您关于“同时生成多行”的问题有关。

You could also try using the same starting character and tinker with the hidden state to input into the model, eg seeing what a randomly sampled state for the batch produces or extracting hidden state vectors for that starting character from training examples and populating the batched hidden state with those so your models goes in different directions from this initial character. You could also try using the same starting character and tinker with the hidden state to input into the model, eg seeing what a randomly sampled state for the batch produces or extracting hidden state vectors for that starting character from training examples and populating the batched hidden state有了这些，你的模型就会从这个初始角色走向不同的方向。

在 GPU 上加速 TF/Keras LSTM 文本生成？

问题描述

2 个解决方案

解决方案1
1 2020-07-26 19:28:46

解决方案2
0 2020-07-24 19:09:12

在 GPU 上加速 TF/Keras LSTM 文本生成？

问题描述

2 个解决方案

解决方案1 1 2020-07-26 19:28:46

解决方案2 0 2020-07-24 19:09:12

解决方案1
1 2020-07-26 19:28:46

解决方案2
0 2020-07-24 19:09:12