Sagemaker：只读文件系统：/opt/ml/models/../config.json 调用端点时

Question

Trying to create a Multi Model with sagemaker.尝试使用 sagemaker 创建一个 Multi Model。 Doing the following:执行以下操作：

boto_seasson = boto3.session.Session(region_name='us-east-1')
sess = sagemaker.Session(boto_session=boto_seasson)

iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker-role')['Role']['Arn']

huggingface_model = HuggingFaceModel(model_data='s3://bucket/path/model.tar.gz',
                                     transformers_version="4.12.3",
                                     pytorch_version="1.9.1",
                                     py_version='py38',
                                     role=role,
                                     sagemaker_session=sess)
mme = MultiDataModel(name='model-name',
                     model_data_prefix='s3://bucket/path/',
                     model=huggingface_model,
                     sagemaker_session=sess)
predictor = mme.deploy(initial_instance_count=1, instance_type="ml.t2.medium")

If I try to predict:如果我尝试预测：

predictor.predict({"inputs": "test"}, target_model="model.tar.gz")

I get the following error:我收到以下错误：

{ModelError}An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "[Errno 30] Read-only file system: \u0027/opt/ml/models/d8379026esds430426d32321a85878f6b/model/config.json\u0027"
}

If I deploy a single model through the huggingfacemodel:如果我通过 huggingface 模型部署单个 model：

huggingface_model = HuggingFaceModel(model_data='s3://bucket/path/model.tar.gz',
                                     transformers_version="4.12.3",
                                     pytorch_version="1.9.1",
                                     py_version='py38',
                                     role=role,
                                     sagemaker_session=sess)
predictor = huggingface_model.deploy(initial_instance_count=1, instance_type="ml.t2.medium")

Then predict works normally with no error.然后predict正常工作，没有错误。

So I was wondering what could be the reason that i get 'read-only' om MultiDataModel deploy?所以我想知道我在MultiDataModel部署中获得“只读”的原因可能是什么？

thanks in advance.提前致谢。

Answer 1

Hey Mpizos do you have any logs from CloudWatch?您好 Mpizos，您有 CloudWatch 的任何日志吗？ Also one thing I noticed for the MultiDataModel you are specifying a specific model.tar.gz as shown in following code.对于 MultiDataModel，我还注意到一件事，您指定了一个特定的 model.tar.gz，如以下代码所示。

huggingface_model = HuggingFaceModel(model_data='s3://bucket/path/model.tar.gz',
                                     transformers_version="4.12.3",
                                     pytorch_version="1.9.1",
                                     py_version='py38',
                                     role=role,
                                     sagemaker_session=sess)

For MME the model data needs to be a bucket/prefix/ or just a bucket/ this should contain the multiple model.tar.gz's for the different models.对于 MME，model 数据需要是一个桶/前缀/或只是一个桶/这应该包含多个 model.tar.gz 用于不同的模型。 Maybe adjust this to have the right path for all the models and let me know if it's resolved your issue.也许调整它以获得所有模型的正确路径，如果它解决了您的问题，请告诉我。 Another option is utilizing Boto3 for MME deployment this is lower level and gives more granularity in any issues please observe the following example: https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/RealTime/Multi-Model-Endpoint/Pre-Trained-Deployment .另一种选择是利用 Boto3 进行 MME 部署，这是较低级别并在任何问题上提供更多粒度请观察以下示例： https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/RealTime/Multi-Model-Endpoint /预训练部署。

Sagemaker：只读文件系统：/opt/ml/models/../config.json 调用端点时

问题描述

1 个解决方案

解决方案1
0 2023-01-12 18:07:00

Sagemaker：只读文件系统：/opt/ml/models/../config.json 调用端点时

问题描述

1 个解决方案

解决方案1 0 2023-01-12 18:07:00

解决方案1
0 2023-01-12 18:07:00