简体   繁体   English

在 aws sagemaker 上部署预训练的 tensorflow model - ModelError:调用 InvokeEndpoint 操作时发生错误 (ModelError)

[英]Deploy pre-trained tensorflow model on the aws sagemaker - ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation

This is the first time I am using amazon web services to deploy my machine learning pre-trained model.这是我第一次使用亚马逊 web 服务来部署我的机器学习预训练 model。 I want to deploy my pre-trained TensorFlow model to Aws-Sagemaker.我想将我预训练的 TensorFlow model 部署到 Aws-Sagemaker。 I am somehow able to deploy the endpoints successfully But whenever I call the predictor.predict(some_data) method to make prediction to invoking the endpoints it's throwing an error.我能够以某种方式成功部署端点但是每当我调用predictor.predict(some_data)方法进行预测以调用端点时,它都会引发错误。

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/sagemaker-tensorflow-2020-04-07-04-25-27-055 in account 453101909370 for more information.

After going through the cloud watch logs I found this error.通过云观察日志后,我发现了这个错误。

#011details = "NodeDef mentions attr 'explicit_paddings' not in Op<name=Conv2D; signature=input:T, filter:T -> output:T; attr=T:type,allowed=[DT_HALF, DT_BFLOAT16, DT_FLOAT, DT_DOUBLE]; attr=strides:list(int); attr=use_cudnn_on_gpu:bool,default=true; attr=padding:string,allowed=["SAME", "VALID"]; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]; attr=dilations:list(int),default=[1, 1, 1, 1]>; NodeDef: {{node conv1_conv/convolution}} = Conv2D[T=DT_FLOAT, _output_shapes=[[?,112,112,64]], data_format="NHWC", dilations=[1, 1, 1, 1], explicit_paddings=[], padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](conv1_pad/Pad, conv1_conv/kernel/read). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

I don't know where I am wrong and I have wasted 2 days already to solve this error and couldn't find out the information regarding this.我不知道我错在哪里,我已经浪费了 2 天时间来解决这个错误并且找不到关于这个的信息。 The detailed logs I have shared here .我在这里分享的详细日志。

Tensorflow version of my notebook instance is 1.15 Tensorflow 我的笔记本实例版本是1.15

After a lot of searching and try & error, I was able to solve this problem.经过大量的搜索和尝试和错误,我能够解决这个问题。 In many cases, the problem arises because of the TensorFlow and Python versions.在许多情况下,由于 TensorFlow 和 Python 版本而出现问题。

Cause of the problem: To deploy the endpoints, I was using the TensorflowModel on TF 1.12 and python 3 and which exactly caused the problem.问题的原因:为了部署端点,我在 TF 1.12 和 python 3 上使用了TensorflowModel ,这正是导致问题的原因。

 sagemaker_model = TensorFlowModel(model_data = model_data, role = role, framework_version = '1.12', entry_point = 'train.py')

Apparently, TensorFlowModel only allows python 2 on TF version 1.11, 1.12.显然, TensorFlowModel只允许在 TF 版本 1.11、1.12 上使用 python 2。 2.1.0. 2.1.0。

How I fixed it: There are two TensorFlow solutions that handle serving in the Python SDK.我如何修复它:有两个 TensorFlow 解决方案可以处理 Python SDK 中的服务。 They have different class representations and documentation as shown here.它们有不同的 class 表示和文档,如此处所示。

  1. TensorFlowModel - https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/model.py#L47 TensorFlowModel - https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/model.py#L47
  1. Model - https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/serving.py#L96 Model - https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/serving.py#L96

Python 3 isn't supported using the TensorFlowModel object, as the container uses the TensorFlow serving API library in conjunction with the GRPC client to handle making inferences, however, the TensorFlow serving API isn't supported in Python 3 officially, so there are only Python 2 versions of the containers when using the TensorFlowModel object. Python 3 isn't supported using the TensorFlowModel object, as the container uses the TensorFlow serving API library in conjunction with the GRPC client to handle making inferences, however, the TensorFlow serving API isn't supported in Python 3 officially, so there are only Python 使用TensorFlowModel object 时的 2 个容器版本。 If you need Python 3 then you will need to use the Model object defined in #2 above.如果您需要 Python 3,那么您将需要使用上面 #2 中定义的Model object。

Finally, I used the Model with the TensorFlow version 1.15.1.最后,我使用了Model和 TensorFlow 版本 1.15.1。

 sagemaker_model = Model(model_data = model_data, role = role, framework_version='1.15.2', entry_point = 'train.py')

Also, here are the successful results.此外,这里是成功的结果。 在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用自定义训练的 Keras model 和 Sagemaker 端点结果 ModelError:调用 InvokeEndpoint 操作时发生错误(ModelError): - Using custom trained Keras model with Sagemaker endpoint results ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: SageMaker:调用 InvokeEndpoint 操作时发生错误 (ModelError):无法评估提供的有效负载 - SageMaker: An error occurred (ModelError) when calling the InvokeEndpoint operation: unable to evaluate payload provided Sagemaker Pytorch model - 调用 InvokeEndpoint 操作时发生错误(InternalFailure)(达到最大重试次数:4): - Sagemaker Pytorch model - An error occurred (InternalFailure) when calling the InvokeEndpoint operation (reached max retries: 4): 借助 AWS SageMaker,是否可以使用 sagemaker SDK 部署预训练模型? - With AWS SageMaker, is it possible to deploy a pre-trained model using the sagemaker SDK? 如何使用 AWS SageMaker Notebook 实例部署预训练的 model? - How to deploy a Pre-Trained model using AWS SageMaker Notebook Instance? 调用 InvokeEndpoint 操作时发生错误 (InternalFailure):向模型发送请求时发生异常 - An error occurred (InternalFailure) when calling the InvokeEndpoint operation: An exception occurred while sending request to model 如何在 AWS sagemaker 中运行预训练的 model? - how to run a pre-trained model in AWS sagemaker? AWS sagemaker invokeEndpoint 模型内部错误 - AWS sagemaker invokeEndpoint model internal error AWS Sagemaker - ClientError:调用 CreateTransformJob 操作时发生错误 (ValidationException) - AWS Sagemaker - ClientError: An error occurred (ValidationException) when calling the CreateTransformJob operation AWS Sagemaker,InvokeEndpoint操作,模型错误:“使用序列设置数组元素。” - AWS Sagemaker, InvokeEndpoint operation, Model error: “setting an array element with a sequence.”
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM