Sagemaker inference : how to load model

Question

I have trained a BERT model on sagemaker and now I want to get it ready for making predictions, ie, inference.

I have used pytorch to train the model and model is saved to s3 bucket after training.

Here is structure inside model.tar.gz file which is present in s3 bucket.

Now, I do not understand how can I make predictions on it. I have tried to follow many guides but still could not understand.

Here is something which I have tried:

inference_image_uri = sagemaker.image_uris.retrieve(
    framework='pytorch',
    version='1.7.1',
    instance_type=inference_instance_type,
    region=aws_region,
    py_version='py3',
    image_scope='inference'
)

sm.create_model(
    ModelName=model_name,
    ExecutionRoleArn=role,
    PrimaryContainer={
        'ModelDataUrl': model_s3_dir,
        'Image': inference_image_uri 
    }
)

sm.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "VariantName": "variant1", # The name of the production variant.
            "ModelName": model_name, 
            "InstanceType": inference_instance_type, # Specify the compute instance type.
            "InitialInstanceCount": 1 # Number of instances to launch initially.
        }
    ]
)

sm.create_endpoint(
    EndpointName=endpoint_name, 
    EndpointConfigName=endpoint_config_name
)

from sagemaker.predictor import Predictor
from sagemaker.serializers import JSONLinesSerializer
from sagemaker.deserializers import JSONLinesDeserializer

inputs = [
    {"inputs": ["I have a question  [EOT]  Hey Manish Mittal ! I'm OneAssist bot. I'm here to answer your queries. [SEP] thanks"]},
#     {"features": ["OK, but not great."]},
#     {"features": ["This is not the right product."]},
]


predictor = Predictor(
    endpoint_name=endpoint_name, 
    serializer=JSONLinesSerializer(), 
    deserializer=JSONLinesDeserializer(),
    sagemaker_session=sess
)

predicted_classes = predictor.predict(inputs)

for predicted_class in predicted_classes:
    print("Predicted class {} with probability {}".format(predicted_class['predicted_label'], predicted_class['probability']))

I can see the endpoint created but while predicting, its giving me error:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from primary with message "Your invocation timed out while waiting for a response from container primary. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again."

I do not understand how to make it work, and also, do I need to give any entry script to the inference, if yes where.

Answer 1

Here's detailed documentation on deploying PyTorch models - https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#deploy-pytorch-models

If you're using the default model_fn provided by the estimator, you'll need to have the model as model.pt .

To write your own inference script and deploy the model, see the section on Bring your own model . The pytorch_model.deploy function will deploy it to a real-time endpoint, and then you can use the predictor.predict function on the resulting endpoint variable.

Sagemaker inference : how to load model

Question

1 answers

solution1
1 2022-03-28 14:36:41

Sagemaker inference : how to load model

Question

1 answers

solution1 1 2022-03-28 14:36:41

solution1
1 2022-03-28 14:36:41