将 tensorflow 模型部署到 Amazon SageMaker 时出现 ValueError

Question

I want to deploy my trained tensorflow model to the amazon sagemaker, I am following the official guide here: https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/ to deploy my model using jupyter notebook.我想将训练有素的 tensorflow 模型部署到 amazon sagemaker，我在这里遵循官方指南： https ://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using -amazon-sagemaker/使用 jupyter notebook 部署我的模型。

But when I try to use code:但是当我尝试使用代码时：

predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type='ml.t2.medium')

It gives me the following error message:它给了我以下错误消息：

ValueError: Error hosting endpoint sagemaker-tensorflow-2019-08-07-22-57-59-547: Failed Reason: The image '520713654638.dkr.ecr.us-west-1.amazonaws.com/sagemaker-tensorflow:1.12-cpu-py3 ' does not exist. ValueError：托管端点 sagemaker-tensorflow-2019-08-07-22-57-59-547 时出错：失败原因：图像 '520713654638.dkr.ecr.us-west-1.amazonaws.com/sagemaker-tensorflow:1.12 -cpu-py3 ' 不存在。

I think the tutorial does not tell me to create an image, and I do not know what to do.我认为教程没有告诉我创建图像，我不知道该怎么做。

import boto3, re
from sagemaker import get_execution_role

role = get_execution_role()

# make a tar ball of the model data files
import tarfile
with tarfile.open('model.tar.gz', mode='w:gz') as archive:
    archive.add('export', recursive=True)

# create a new s3 bucket and upload the tarball to it
import sagemaker

sagemaker_session = sagemaker.Session()
inputs = sagemaker_session.upload_data(path='model.tar.gz', key_prefix='model')

from sagemaker.tensorflow.model import TensorFlowModel
sagemaker_model = TensorFlowModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
                                  role = role,
                                  framework_version = '1.12',
                                  entry_point = 'train.py',
                                  py_version='py3')

%%time
#here I fail to deploy the model and get the error message
predictor = sagemaker_model.deploy(initial_instance_count=1,
                                   instance_type='ml.m4.xlarge')

Answer 1

https://github.com/aws/sagemaker-python-sdk/issues/912#issuecomment-510226311 https://github.com/aws/sagemaker-python-sdk/issues/912#issuecomment-510226311

As mentioned in the issue 如问题中所述

Python 3 isn't supported using the TensorFlowModel object, as the container uses the TensorFlow serving api library in conjunction with the GRPC client to handle making inferences, however the TensorFlow serving api isn't supported in Python 3 officially, so there are only Python 2 versions of the containers when using the TensorFlowModel object. 使用TensorFlowModel对象不支持Python 3，因为容器将TensorFlow服务api库与GRPC客户端结合使用来处理推论，但是Python 3正式不支持TensorFlow服务api，因此只有Python使用TensorFlowModel对象时有2个版本的容器。

If you need Python 3 then you will need to use the Model object defined in #2 above. 如果需要Python 3，则需要使用上面＃2中定义的Model对象。 The inference script format will change if you need to handle pre and post processing. 如果您需要处理前后处理，则推理脚本格式将更改。 https://github.com/aws/sagemaker-tensorflow-serving-container#prepost-processing . https://github.com/aws/sagemaker-tensorflow-serving-container#prepost-processing 。

Answer 2

even I am getting this error: Failure reason CannotStartContainerError.即使我收到此错误：失败原因无法启动容器错误。 Please ensure the model container for variant AllTraffic starts correctly when invoked with 'docker run serve'.请确保变体 AllTraffic 的模型容器在使用“docker run serve”调用时正确启动。

将 tensorflow 模型部署到 Amazon SageMaker 时出现 ValueError

问题描述

1 个解决方案

解决方案1
0 2019-08-09 11:18:47

解决方案2
0 2022-01-18 18:00:41

将 tensorflow 模型部署到 Amazon SageMaker 时出现 ValueError

问题描述

1 个解决方案

解决方案1 0 2019-08-09 11:18:47

解决方案2 0 2022-01-18 18:00:41

解决方案1
0 2019-08-09 11:18:47

解决方案2
0 2022-01-18 18:00:41