Sagemaker: read-only file system: /opt/ml/models/../config.json when invoking endpoint

Question

Trying to create a Multi Model with sagemaker. Doing the following:

boto_seasson = boto3.session.Session(region_name='us-east-1')
sess = sagemaker.Session(boto_session=boto_seasson)

iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker-role')['Role']['Arn']

huggingface_model = HuggingFaceModel(model_data='s3://bucket/path/model.tar.gz',
                                     transformers_version="4.12.3",
                                     pytorch_version="1.9.1",
                                     py_version='py38',
                                     role=role,
                                     sagemaker_session=sess)
mme = MultiDataModel(name='model-name',
                     model_data_prefix='s3://bucket/path/',
                     model=huggingface_model,
                     sagemaker_session=sess)
predictor = mme.deploy(initial_instance_count=1, instance_type="ml.t2.medium")

If I try to predict:

predictor.predict({"inputs": "test"}, target_model="model.tar.gz")

I get the following error:

{ModelError}An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "[Errno 30] Read-only file system: \u0027/opt/ml/models/d8379026esds430426d32321a85878f6b/model/config.json\u0027"
}

If I deploy a single model through the huggingfacemodel:

huggingface_model = HuggingFaceModel(model_data='s3://bucket/path/model.tar.gz',
                                     transformers_version="4.12.3",
                                     pytorch_version="1.9.1",
                                     py_version='py38',
                                     role=role,
                                     sagemaker_session=sess)
predictor = huggingface_model.deploy(initial_instance_count=1, instance_type="ml.t2.medium")

Then predict works normally with no error.

So I was wondering what could be the reason that i get 'read-only' om MultiDataModel deploy?

thanks in advance.

Answer 1

Hey Mpizos do you have any logs from CloudWatch? Also one thing I noticed for the MultiDataModel you are specifying a specific model.tar.gz as shown in following code.

huggingface_model = HuggingFaceModel(model_data='s3://bucket/path/model.tar.gz',
                                     transformers_version="4.12.3",
                                     pytorch_version="1.9.1",
                                     py_version='py38',
                                     role=role,
                                     sagemaker_session=sess)

For MME the model data needs to be a bucket/prefix/ or just a bucket/ this should contain the multiple model.tar.gz's for the different models. Maybe adjust this to have the right path for all the models and let me know if it's resolved your issue. Another option is utilizing Boto3 for MME deployment this is lower level and gives more granularity in any issues please observe the following example: https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/RealTime/Multi-Model-Endpoint/Pre-Trained-Deployment .

Sagemaker: read-only file system: /opt/ml/models/../config.json when invoking endpoint

Question

1 answers

solution1
0 2023-01-12 18:07:00

Sagemaker: read-only file system: /opt/ml/models/../config.json when invoking endpoint

Question

1 answers

solution1 0 2023-01-12 18:07:00

solution1
0 2023-01-12 18:07:00