I have trained a BERT model on sagemaker and now I want to get it ready for making predictions, ie, inference.
I have used pytorch to train the model and model is saved to s3 bucket after training.
Here is structure inside model.tar.gz file which is present in s3 bucket.
Now, I do not understand how can I make predictions on it. I have tried to follow many guides but still could not understand.
Here is something which I have tried:
inference_image_uri = sagemaker.image_uris.retrieve(
framework='pytorch',
version='1.7.1',
instance_type=inference_instance_type,
region=aws_region,
py_version='py3',
image_scope='inference'
)
sm.create_model(
ModelName=model_name,
ExecutionRoleArn=role,
PrimaryContainer={
'ModelDataUrl': model_s3_dir,
'Image': inference_image_uri
}
)
sm.create_endpoint_config(
EndpointConfigName=endpoint_config_name,
ProductionVariants=[
{
"VariantName": "variant1", # The name of the production variant.
"ModelName": model_name,
"InstanceType": inference_instance_type, # Specify the compute instance type.
"InitialInstanceCount": 1 # Number of instances to launch initially.
}
]
)
sm.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=endpoint_config_name
)
from sagemaker.predictor import Predictor
from sagemaker.serializers import JSONLinesSerializer
from sagemaker.deserializers import JSONLinesDeserializer
inputs = [
{"inputs": ["I have a question [EOT] Hey Manish Mittal ! I'm OneAssist bot. I'm here to answer your queries. [SEP] thanks"]},
# {"features": ["OK, but not great."]},
# {"features": ["This is not the right product."]},
]
predictor = Predictor(
endpoint_name=endpoint_name,
serializer=JSONLinesSerializer(),
deserializer=JSONLinesDeserializer(),
sagemaker_session=sess
)
predicted_classes = predictor.predict(inputs)
for predicted_class in predicted_classes:
print("Predicted class {} with probability {}".format(predicted_class['predicted_label'], predicted_class['probability']))
I can see the endpoint created but while predicting, its giving me error:
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from primary with message "Your invocation timed out while waiting for a response from container primary. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again."
I do not understand how to make it work, and also, do I need to give any entry script to the inference, if yes where.
Here's detailed documentation on deploying PyTorch models - https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#deploy-pytorch-models
If you're using the default model_fn
provided by the estimator, you'll need to have the model as model.pt
.
To write your own inference script and deploy the model, see the section on Bring your own model . The pytorch_model.deploy
function will deploy it to a real-time endpoint, and then you can use the predictor.predict
function on the resulting endpoint variable.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.