Keras中CPU和GPU的混合使用

Question

I am building a neural network on Keras, including multiple layers of LSTM, Permute and Dense. 我正在Keras上构建神经网络，包括LSTM，Permute和Dense的多层。

It seems LSTM is GPU-unfriendly. LSTM似乎对GPU不友好。 So I did research and use 所以我做了研究和使用

With tf.device('/cpu:0'):
   out = LSTM(cells)(inp)

But based on my understanding about with , with is try...finally block to ensure that clean-up code is executed. 但是根据我对with理解， with是try...finally块，以确保执行清理代码。 I don't know whether the following CPU/GPU mixture usage code works or not? 我不知道以下CPU / GPU混合使用代码是否有效？ Will they accelerate speed of training? 他们会加快训练速度吗？

With tf.device('/cpu:0'):
  out = LSTM(cells)(inp)
With tf.device('/gpu:0'):
  out = Permute(some_shape)(out)
With tf.device('/cpu:0'):
  out = LSTM(cells)(out)
With tf.device('/gpu:0'):
  out = Dense(output_size)(out)

Answer 1

As you may read here - tf.device is a context manager which switches a default device to this passed as its argument in a context (block) created by it. 您可能会在这里 tf.device是一个上下文管理器，它将默认设备切换为其在其创建的上下文（块）中作为其参数传递的设备。 So this code should run all '/cpu:0' device at CPU and rest on GPU . 因此，此代码应在CPU上运行所有'/cpu:0'设备，并在GPU 。

The question will it speed up your training is really hard to answer because it depends on the machine you use - but I don't expect computations to be faster as each change of a device makes data to be copied between GPU RAM and machine RAM . 能否提高训练速度的问题真的很难回答，因为这取决于您使用的机器-但是我不希望计算速度会更快，因为每次设备更改都会在GPU RAM和机器RAM之间复制数据。 This could even slow down your computations. 这甚至可能减慢您的计算速度。

Answer 2

I have created a model using 2 LSTM and 1 dense layers and trained it in my GPU (NVidia GTX 10150Ti) Here is my observations. 我使用2个LSTM和1个密集层创建了一个模型，并在我的GPU（NVidia GTX 10150Ti）中对其进行了训练。这是我的观察结果。

use CUDA LSTM https://keras.io/layers/recurrent/#cudnnlstm 使用CUDA LSTM https://keras.io/layers/recurrent/#cudnnlstm
Use a bath size which helps more GPU parallelism, if I use a very small batch size(2-10) GPU multi cores are not utilized; 如果我使用非常小的批处理大小（2-10），则使用浴池大小有助于更多的GPU并行性；不使用GPU多核； so I used 100 as batch size 所以我用100作为批量
If I train my network on GPU and try to use it for predictions on CPU, it works in-terms of compiling and running but the predictions are weird. 如果我在GPU上训练我的网络，并尝试将其用于CPU上的预测，那么它可以在编译和运行时正常运行，但是预测很奇怪。 In my case I have the luxury to use a GPU for prediction as well. 就我而言，我也很乐意使用GPU进行预测。
for multi layer LSTM, need to use 对于多层LSTM，需要使用

here is some sample snippet 这是一些示例片段

model = keras.Sequential()
model.add(keras.layers.cudnn_recurrent.CuDNNLSTM(neurons
                , batch_input_shape=(nbatch_size, reshapedX.shape[1], reshapedX.shape[2])
                , return_sequences=True
                , stateful=True))

Answer 3

TojoHere's answer one needs to be upvoted! TojoHere的答案之一需要被投票！ This trick made my LSTM training almost 10 times faster. 这个技巧使我的LSTM培训速度提高了近10倍。 Thanks a lot! 非常感谢！

Keras中CPU和GPU的混合使用

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-09-22 19:03:25

解决方案2
0 2017-12-21 05:23:05

解决方案3
-1 2019-10-18 08:35:04

Keras中CPU和GPU的混合使用

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-09-22 19:03:25

解决方案2 0 2017-12-21 05:23:05

解决方案3 -1 2019-10-18 08:35:04

解决方案1
1 已采纳 2017-09-22 19:03:25

解决方案2
0 2017-12-21 05:23:05

解决方案3
-1 2019-10-18 08:35:04