运行预测错误 keras multi_gpu_model

Question

我在 Google Cloud Platform 实例上运行 Keras model 时遇到问题。
model 如下：

n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]

train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))

verbose, epochs, batch_size = 1, 1, 64  # low number of epochs just for testing purpose
with tf.device('/cpu:0'):
    m = Sequential()
    m.add(CuDNNLSTM(20, input_shape=(n_timesteps, n_features)))
    m.add(LeakyReLU(alpha=0.1))
    m.add(RepeatVector(n_outputs))
    m.add(CuDNNLSTM(20, return_sequences=True))
    m.add(LeakyReLU(alpha=0.1))
    m.add(TimeDistributed(Dense(20)))
    m.add(LeakyReLU(alpha=0.1))
    m.add(TimeDistributed(Dense(1)))

self.model = multi_gpu_model(m, gpus=8)
self.model.compile(loss='mse', optimizer='adam')

self.model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)

正如您从上面的代码中看到的，我在具有 8 个 GPU（Nvidia Tesla K80）的机器上运行 model。
火车运行良好，没有任何错误。 但是，预测失败并返回以下错误：

W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES 在 cudnn_rnn_ops.cc:1336 处失败：未知：tensorflow/stream_executor/cuda/cuda_dnn.cc(1285) 中的 CUDNN_STATUS_BAD_PARAM：'cudnnSetTensorNdDescriptor(tensor_desc.get()，data_type， sizeof(dims) / sizeof(dims[0]), dims, strides)'

这里是运行预测的代码：

self.model.predict(input_x)

我注意到的是，如果我删除多 GPU 数据并行的代码，则代码使用单个 GPU 运行良好。
更准确地说，如果我评论这一行，代码可以正常工作

self.model = multi_gpu_model(m, gpus=8)

我错过了什么？

虚拟环境信息

cudatoolkit - 10.0.130
cudnn - 7.6.4
keras - 2.2.4
keras 应用程序 - 1.0.8
keras-base - 2.2.4
keras GPU - 2.2.4
python - 3.6

更新

train_x.shape = (1441, 288, 1)
train_y.shape = (1441, 288, 1)
input_x.shape = (1, 288, 1)

在 Olivier Dehaene 的回复之后，我尝试了他的建议并且成功了。
我试图修改 input_x 形状以获得 (8, 288, 1)。
为了做到这一点，我还修改了 train_x 和 train_y 形状。
这里回顾一下：

train_x.shape = (8065, 288, 1)
train_y.shape = (8065, 288, 1)
input_x.shape = (8, 288, 1)

但是现在我在训练阶段遇到了同样的错误，在这一行：

self.model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)

Answer 1

从tf.keras.utils.multi_gpu_model我们可以看到它的工作方式如下：

将模型的输入分成多个子批次。

在每个子批次上应用 model 副本。 每个 model 副本都在专用 GPU 上执行。

将结果（在 CPU 上）连接成一个大批次。

您正在触发错误，因为对于 model 副本中的至少一个， CuDNNLSTM层的输入为空。 这是因为除法运算要求： input // n_gpus > 0

试试这个代码：

input_x = np.random.randn(8, n_timesteps, n_features)
model.predict(input_x)

运行预测错误 keras multi_gpu_model

问题描述

1 个解决方案

解决方案1
3 2019-11-14 09:55:28

运行预测错误 keras multi_gpu_model

问题描述

1 个解决方案

解决方案1 3 2019-11-14 09:55:28

解决方案1
3 2019-11-14 09:55:28