简体   繁体   English

AWS SageMaker PyTorch:没有名为“sagemaker”的模块

[英]AWS SageMaker PyTorch: no module named 'sagemaker'

I have deployed a PyTorch model on AWS with SageMaker, and I try to send a request to test the service.我已经使用 SageMaker 在 AWS 上部署了 PyTorch 模型,并尝试发送测试该服务的请求。 However, I got a very vague error message saying "no module named 'sagemaker'".但是,我收到一条非常模糊的错误消息,说“没有名为‘sagemaker’的模块”。 I have tried to search online, but cannot find posts about similar message.我曾尝试在线搜索,但找不到有关类似消息的帖子。

My client code:我的客户代码:

import numpy as np
from sagemaker.pytorch.model import PyTorchPredictor

ENDPOINT = '<endpoint name>'

predictor = PyTorchPredictor(ENDPOINT)
predictor.predict(np.random.random_sample([1, 3, 224, 224]).tobytes())

Detailed error message:详细的错误信息:

Traceback (most recent call last):
  File "client.py", line 7, in <module>
    predictor.predict(np.random.random_sample([1, 3, 224, 224]).tobytes())
  File "/Users/jiashenc/Env/py3/lib/python3.7/site-packages/sagemaker/predictor.py", line 110, in predict
    response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
  File "/Users/jiashenc/Env/py3/lib/python3.7/site-packages/botocore/client.py", line 276, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/jiashenc/Env/py3/lib/python3.7/site-packages/botocore/client.py", line 586, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "No module named 'sagemaker'". See https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=/aws/sagemaker/Endpoints/<endpoint name> in account xxxxxxxxxxxxxx for more information.

This bug is because I merge both the serving script and my deploy script together, see below这个错误是因为我将服务脚本和我的部署脚本合并在一起,见下文

import os
import torch
import numpy as np
from sagemaker.pytorch.model import PyTorchModel
from torch import cuda
from torchvision.models import resnet50


def model_fn(model_dir):
    device = torch.device('cuda' if cuda.is_available() else 'cpu')
    model = resnet50()
    with open(os.path.join(model_dir, 'model.pth'), 'rb') as f:
        model.load_state_dict(torch.load(f, map_location=device))
    return model.to(device)

def predict_fn(input_data, model):
    device = torch.device('cuda' if cuda.is_available() else 'cpu')
    model.eval()
    with torch.no_grad():
        return model(input_data.to(device))


if __name__ == '__main__':
    pytorch_model = PyTorchModel(model_data='s3://<bucket name>/resnet50/model.tar.gz',
                                    entry_point='serve.py', role='jiashenC-sagemaker',
                                    py_version='py3', framework_version='1.3.1')
    predictor = pytorch_model.deploy(instance_type='ml.t2.medium', initial_instance_count=1)
    print(predictor.predict(np.random.random_sample([1, 3, 224, 224]).astype(np.float32)))

The root cause is the 4th line in my code.根本原因是我的代码中的第 4 行。 It tries to import sagemaker, which is an unavailable library.它尝试导入 sagemaker,这是一个不可用的库。

(edit 2/9/2020 with extra code snippets) (使用额外的代码片段编辑 2/9/2020)

Your serving code tries to use the sagemaker module internally.您的服务代码尝试在内部使用sagemaker模块。 The sagemaker module (also called SageMaker Python SDK , one of the numerous orchestration SDKs for SageMaker) is not designed to be used in model containers, but instead out of models, to orchestrate their activity (train, deploy, bayesian tuning, etc). sagemaker模块(也称为SageMaker Python SDKSageMaker的众多编排 SDK 之一)并非设计用于模型容器,而是用于模型之外,用于编排其活动(训练、部署、贝叶斯调整等)。 In your specific example, you shouldn't include the deployment and model call code to server code, as those are actually actions that will be conducted from outside the server to orchestrate its lifecyle and interact with it.在您的具体示例中,您不应将部署和模型调用代码包含到服务器代码中,因为这些实际上是将从服务器外部进行的操作,以编排其生命周期并与之交互。 For model deployment with the Sagemaker Pytorch container, your entry point script just needs to contain the required model_fn function for model deserialization, and optionally an input_fn , predict_fn and output_fn , respectively for pre-processing, inference and post-processing ( detailed in the documentation here ).对于使用 Sagemaker Pytorch 容器的模型部署,您的入口点脚本只需要包含用于模型反序列化所需的model_fn函数,以及可选的input_fnpredict_fnoutput_fn ,分别用于预处理、推理和后处理( 详见文档在这里)。 This logic is beautiful :) : you don't need anything else to deploy a production-ready deep learning server!这个逻辑很漂亮:):你不需要任何其他东西来部署一个生产就绪的深度学习服务器! (MMS in the case of Pytorch and MXNet, Flask+Gunicorn in the case of sklearn). (在 Pytorch 和 MXNet 的情况下是 MMS,在 sklearn 的情况下是 Flask+Gunicorn)。

In summary, this is how your code should be split:总之,这就是您的代码应该如何拆分:

An entry_point script serve.py that contains model serving code and looks like this:一个 entry_point 脚本serve.py包含模型服务代码,如下所示:

import os

import numpy as np
import torch
from torch import cuda
from torchvision.models import resnet50

def model_fn(model_dir):
    # TODO instantiate a model from its artifact stored in model_dir
    return model

def predict_fn(input_data, model):
    # TODO apply model to the input_data, return result of interest
    return result

and some orchestration code to instantiate a SageMaker Model object, deploy it to a server and query it.以及一些用于实例化 SageMaker 模型对象、将其部署到服务器并进行查询的编排代码。 This is run from the orchestration runtime of your choice, which could be a SageMaker Notebook, your laptop, an AWS Lambda function, an Apache Airflow operator, etc - and with the SDK for your choice;这是从您选择的编排运行时运行的,可以是 SageMaker Notebook、您的笔记本电脑、AWS Lambda 函数、Apache Airflow 运算符等 - 以及供您选择的 SDK; don't need to use python for this.不需要为此使用python。

import numpy as np
from sagemaker.pytorch.model import PyTorchModel

pytorch_model = PyTorchModel(
    model_data='s3://<bucket name>/resnet50/model.tar.gz',
    entry_point='serve.py',
    role='jiashenC-sagemaker',
    py_version='py3',
    framework_version='1.3.1')

predictor = pytorch_model.deploy(instance_type='ml.t2.medium', initial_instance_count=1)

print(predictor.predict(np.random.random_sample([1, 3, 224, 224]).astype(np.float32)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM