Sagemaker Pytorch model - An error occurred (InternalFailure) when calling the InvokeEndpoint operation (reached max retries: 4):

Question

I am facing an issue while invoking the Pytorch model Endpoint. Please check the below error for detail.

Error Message: An error occurred (InternalFailure) when calling the InvokeEndpoint operation (reached max retries: 4): An exception occurred while sending request to model. Please contact customer support regarding request 9d4f143b-497f-47ce-9d45-88c697c4b0c4.

Automatically restarted the Endpoint after this error. No specific log in cloud watch. Please help me here...

Thanks...

Answer 1

There may be a few issues here we can explore the paths and ways to resolve.

Inference Code Error Sometimes these errors occur when your payload or what you're feeding your endpoint is not in the appropriate format. When invoking the endpoint you want to make sure your data is in the correct format/encoded properly. For this you can use the serializer SageMaker provides when creating the endpoint. The serializer takes care of encoding for you and sends data in the appropriate format. Look at the following code snippet.

from sagemaker.predictor import csv_serializer
rf_pred = rf.deploy(1, "ml.m4.xlarge", serializer=csv_serializer)
print(rf_pred.predict(payload).decode('utf-8'))

For more information about the different serializers based off the type of data you are feeding in check the following link. https://sagemaker.readthedocs.io/en/stable/api/inference/serializers.html

Throttling Limits Reached Sometimes the payload you are feeding in may be too large or the API request rate may have been exceeded for the endpoint so experiment with a more compute heavy instance or increase retries in your boto3 configuration. Here is a link for an example of what retries are and configuring them for your endpoint.

https://aws.amazon.com/premiumsupport/knowledge-center/sagemaker-python-throttlingexception/

I work for AWS & my opinions are my own

Sagemaker Pytorch model - An error occurred (InternalFailure) when calling the InvokeEndpoint operation (reached max retries: 4):

Question

1 answers

solution1
0 2021-07-22 17:58:53

Sagemaker Pytorch model - An error occurred (InternalFailure) when calling the InvokeEndpoint operation (reached max retries: 4):

Question

1 answers

solution1 0 2021-07-22 17:58:53

solution1
0 2021-07-22 17:58:53