简体   繁体   中英

AWS SageMaker PyTorch: no module named 'sagemaker'

I have deployed a PyTorch model on AWS with SageMaker, and I try to send a request to test the service. However, I got a very vague error message saying "no module named 'sagemaker'". I have tried to search online, but cannot find posts about similar message.

My client code:

import numpy as np
from sagemaker.pytorch.model import PyTorchPredictor

ENDPOINT = '<endpoint name>'

predictor = PyTorchPredictor(ENDPOINT)
predictor.predict(np.random.random_sample([1, 3, 224, 224]).tobytes())

Detailed error message:

Traceback (most recent call last):
  File "client.py", line 7, in <module>
    predictor.predict(np.random.random_sample([1, 3, 224, 224]).tobytes())
  File "/Users/jiashenc/Env/py3/lib/python3.7/site-packages/sagemaker/predictor.py", line 110, in predict
    response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
  File "/Users/jiashenc/Env/py3/lib/python3.7/site-packages/botocore/client.py", line 276, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/jiashenc/Env/py3/lib/python3.7/site-packages/botocore/client.py", line 586, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "No module named 'sagemaker'". See https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=/aws/sagemaker/Endpoints/<endpoint name> in account xxxxxxxxxxxxxx for more information.

This bug is because I merge both the serving script and my deploy script together, see below

import os
import torch
import numpy as np
from sagemaker.pytorch.model import PyTorchModel
from torch import cuda
from torchvision.models import resnet50


def model_fn(model_dir):
    device = torch.device('cuda' if cuda.is_available() else 'cpu')
    model = resnet50()
    with open(os.path.join(model_dir, 'model.pth'), 'rb') as f:
        model.load_state_dict(torch.load(f, map_location=device))
    return model.to(device)

def predict_fn(input_data, model):
    device = torch.device('cuda' if cuda.is_available() else 'cpu')
    model.eval()
    with torch.no_grad():
        return model(input_data.to(device))


if __name__ == '__main__':
    pytorch_model = PyTorchModel(model_data='s3://<bucket name>/resnet50/model.tar.gz',
                                    entry_point='serve.py', role='jiashenC-sagemaker',
                                    py_version='py3', framework_version='1.3.1')
    predictor = pytorch_model.deploy(instance_type='ml.t2.medium', initial_instance_count=1)
    print(predictor.predict(np.random.random_sample([1, 3, 224, 224]).astype(np.float32)))

The root cause is the 4th line in my code. It tries to import sagemaker, which is an unavailable library.

(edit 2/9/2020 with extra code snippets)

Your serving code tries to use the sagemaker module internally. The sagemaker module (also called SageMaker Python SDK , one of the numerous orchestration SDKs for SageMaker) is not designed to be used in model containers, but instead out of models, to orchestrate their activity (train, deploy, bayesian tuning, etc). In your specific example, you shouldn't include the deployment and model call code to server code, as those are actually actions that will be conducted from outside the server to orchestrate its lifecyle and interact with it. For model deployment with the Sagemaker Pytorch container, your entry point script just needs to contain the required model_fn function for model deserialization, and optionally an input_fn , predict_fn and output_fn , respectively for pre-processing, inference and post-processing ( detailed in the documentation here ). This logic is beautiful :) : you don't need anything else to deploy a production-ready deep learning server! (MMS in the case of Pytorch and MXNet, Flask+Gunicorn in the case of sklearn).

In summary, this is how your code should be split:

An entry_point script serve.py that contains model serving code and looks like this:

import os

import numpy as np
import torch
from torch import cuda
from torchvision.models import resnet50

def model_fn(model_dir):
    # TODO instantiate a model from its artifact stored in model_dir
    return model

def predict_fn(input_data, model):
    # TODO apply model to the input_data, return result of interest
    return result

and some orchestration code to instantiate a SageMaker Model object, deploy it to a server and query it. This is run from the orchestration runtime of your choice, which could be a SageMaker Notebook, your laptop, an AWS Lambda function, an Apache Airflow operator, etc - and with the SDK for your choice; don't need to use python for this.

import numpy as np
from sagemaker.pytorch.model import PyTorchModel

pytorch_model = PyTorchModel(
    model_data='s3://<bucket name>/resnet50/model.tar.gz',
    entry_point='serve.py',
    role='jiashenC-sagemaker',
    py_version='py3',
    framework_version='1.3.1')

predictor = pytorch_model.deploy(instance_type='ml.t2.medium', initial_instance_count=1)

print(predictor.predict(np.random.random_sample([1, 3, 224, 224]).astype(np.float32)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM