简体   繁体   English

ML引擎批量预测在错误的python版本上运行

[英]ML Engine Batch Prediction running on wrong python version

在此输入图像描述

So I have a tensorflow model in python 3.5 registered with the ML engine and I want to run a batch prediction job using it. 所以我在ML引擎中注册了一个具有ML引擎的张量流模型,我希望使用它来运行批量预测作业。 My API request body looks like: 我的API请求正文如下:

{
  "versionName": "XXXXX/v8_0QSZ",
  "dataFormat": "JSON",
  "inputPaths": [
    "XXXXX"
  ],
  "outputPath": "XXXXXX",
  "region": "us-east1",
  "runtimeVersion": "1.12",
  "accelerator": {
    "count": "1",
    "type": "NVIDIA_TESLA_P100"
  }
}

Then the batch prediction job runs and returns "Job completed successfully.", however, it was completely unsuccessful and consistently threw the following error for each input: 然后批处理预测作业运行并返回“作业成功完成。”但是,它完全不成功并且始终为每个输入抛出以下错误:

Exception during running the graph: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node convolution_layer/conv1d/conv1d/Conv2D (defined at /usr/local/lib/python2.7/dist-packages/google/cloud/ml/prediction/frameworks/tf_prediction_lib.py:210) = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](convolution_layer/conv1d/conv1d/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, convolution_layer/conv1d/conv1d/ExpandDims_1)]] [[{{node Cast_6/_495}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_789_Cast_6", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] 

My questions are: 我的问题是:

  • Why does the batch job report success when in reality it completely failed? 为什么批处理作业报告成功实际上它完全失败了?
  • In the exception above it mentions python 2.7... yet the model is registered as python 3.5 and there is no way to specify the python version using the API. 在上面的例外中,它提到了python 2.7 ......但是模型注册为python 3.5,并且无法使用API​​指定python版本。 Why is the batch prediction using 2.7? 为什么批量预测使用2.7?
  • What in general can I do to make this work? 我一般可以做些什么来完成这项工作?
  • Does this have anything to do with my accelerator option? 这与我的加速器选项有什么关系吗?

来自批量预测开发的响应:“我们还没有正式支持Python 3.然而,您遇到的问题是影响我们的TF 1.11和1.12的GPU运行时的已知错误

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM