ML引擎批量预测在错误的python版本上运行

Question

So I have a tensorflow model in python 3.5 registered with the ML engine and I want to run a batch prediction job using it. 所以我在ML引擎中注册了一个具有ML引擎的张量流模型，我希望使用它来运行批量预测作业。 My API request body looks like: 我的API请求正文如下：

{
  "versionName": "XXXXX/v8_0QSZ",
  "dataFormat": "JSON",
  "inputPaths": [
    "XXXXX"
  ],
  "outputPath": "XXXXXX",
  "region": "us-east1",
  "runtimeVersion": "1.12",
  "accelerator": {
    "count": "1",
    "type": "NVIDIA_TESLA_P100"
  }
}

Then the batch prediction job runs and returns "Job completed successfully.", however, it was completely unsuccessful and consistently threw the following error for each input: 然后批处理预测作业运行并返回“作业成功完成。”但是，它完全不成功并且始终为每个输入抛出以下错误：

Exception during running the graph: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node convolution_layer/conv1d/conv1d/Conv2D (defined at /usr/local/lib/python2.7/dist-packages/google/cloud/ml/prediction/frameworks/tf_prediction_lib.py:210) = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](convolution_layer/conv1d/conv1d/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, convolution_layer/conv1d/conv1d/ExpandDims_1)]] [[{{node Cast_6/_495}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_789_Cast_6", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

My questions are: 我的问题是：

Why does the batch job report success when in reality it completely failed? 为什么批处理作业报告成功实际上它完全失败了？
In the exception above it mentions python 2.7... yet the model is registered as python 3.5 and there is no way to specify the python version using the API. 在上面的例外中，它提到了python 2.7 ......但是模型注册为python 3.5，并且无法使用API指定python版本。 Why is the batch prediction using 2.7? 为什么批量预测使用2.7？
What in general can I do to make this work? 我一般可以做些什么来完成这项工作？
Does this have anything to do with my accelerator option? 这与我的加速器选项有什么关系吗？

Answer 1

来自批量预测开发的响应：“我们还没有正式支持Python 3.然而，您遇到的问题是影响我们的TF 1.11和1.12的GPU运行时的已知错误

ML引擎批量预测在错误的python版本上运行

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-05-28 13:26:58

ML引擎批量预测在错误的python版本上运行

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-05-28 13:26:58

解决方案1
1 已采纳 2019-05-28 13:26:58