简体   繁体   English

Sagemaker:只读文件系统:/opt/ml/models/../config.json 调用端点时

[英]Sagemaker: read-only file system: /opt/ml/models/../config.json when invoking endpoint

Trying to create a Multi Model with sagemaker.尝试使用 sagemaker 创建一个 Multi Model。 Doing the following:执行以下操作:

boto_seasson = boto3.session.Session(region_name='us-east-1')
sess = sagemaker.Session(boto_session=boto_seasson)

iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker-role')['Role']['Arn']

huggingface_model = HuggingFaceModel(model_data='s3://bucket/path/model.tar.gz',
                                     transformers_version="4.12.3",
                                     pytorch_version="1.9.1",
                                     py_version='py38',
                                     role=role,
                                     sagemaker_session=sess)
mme = MultiDataModel(name='model-name',
                     model_data_prefix='s3://bucket/path/',
                     model=huggingface_model,
                     sagemaker_session=sess)
predictor = mme.deploy(initial_instance_count=1, instance_type="ml.t2.medium")

If I try to predict:如果我尝试预测:

predictor.predict({"inputs": "test"}, target_model="model.tar.gz")

I get the following error:我收到以下错误:

{ModelError}An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "[Errno 30] Read-only file system: \u0027/opt/ml/models/d8379026esds430426d32321a85878f6b/model/config.json\u0027"
}

If I deploy a single model through the huggingfacemodel:如果我通过 huggingface 模型部署单个 model:

huggingface_model = HuggingFaceModel(model_data='s3://bucket/path/model.tar.gz',
                                     transformers_version="4.12.3",
                                     pytorch_version="1.9.1",
                                     py_version='py38',
                                     role=role,
                                     sagemaker_session=sess)
predictor = huggingface_model.deploy(initial_instance_count=1, instance_type="ml.t2.medium")

Then predict works normally with no error.然后predict正常工作,没有错误。

So I was wondering what could be the reason that i get 'read-only' om MultiDataModel deploy?所以我想知道我在MultiDataModel部署中获得“只读”的原因可能是什么?

thanks in advance.提前致谢。

Hey Mpizos do you have any logs from CloudWatch?您好 Mpizos,您有 CloudWatch 的任何日志吗? Also one thing I noticed for the MultiDataModel you are specifying a specific model.tar.gz as shown in following code.对于 MultiDataModel,我还注意到一件事,您指定了一个特定的 model.tar.gz,如以下代码所示。

huggingface_model = HuggingFaceModel(model_data='s3://bucket/path/model.tar.gz',
                                     transformers_version="4.12.3",
                                     pytorch_version="1.9.1",
                                     py_version='py38',
                                     role=role,
                                     sagemaker_session=sess)

For MME the model data needs to be a bucket/prefix/ or just a bucket/ this should contain the multiple model.tar.gz's for the different models.对于 MME,model 数据需要是一个桶/前缀/或只是一个桶/这应该包含多个 model.tar.gz 用于不同的模型。 Maybe adjust this to have the right path for all the models and let me know if it's resolved your issue.也许调整它以获得所有模型的正确路径,如果它解决了您的问题,请告诉我。 Another option is utilizing Boto3 for MME deployment this is lower level and gives more granularity in any issues please observe the following example: https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/RealTime/Multi-Model-Endpoint/Pre-Trained-Deployment .另一种选择是利用 Boto3 进行 MME 部署,这是较低级别并在任何问题上提供更多粒度请观察以下示例: https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/RealTime/Multi-Model-Endpoint /预训练部署

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Sagemaker 培训作业失败“”FileNotFoundError:[Errno 2] 没有这样的文件或目录:'/opt/ml/input/data/training/annotations.json'” - Sagemaker training job fails ""FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/input/data/training/annotations.json'" 为 PyTorch Model 调用 SageMaker 端点 - Invoking SageMaker Endpoint for PyTorch Model Google 云函数部署:EROFS:只读文件系统 - Google Clould Functions deploy: EROFS: read-only file system 从 S3 下载文件时 AWS Lambda 中的错误“只读文件系统” - Error "Read-only file system" in AWS Lambda when downloading a file from S3 IOError:[Errno 30] 只读文件系统:'geckodriver.log' - IOError: [Errno 30] Read-only file system: 'geckodriver.log' Azure Function - 异常:OSError:[Errno 30] 只读文件系统: - Azure Function - Exception: OSError: [Errno 30] Read-only file system: 出现错误只读文件系统:python lambda prgrm 中的“CO.dat” - Getting error Read-only file system: 'CO.dat' in python lambda prgrm 将预训练的 Tensorflow 模型部署到 sagemaker 中的一个端点(一个端点的多模型)时出错? - Error when deploying pre trained Tensorflow models to one endpoint (multimodel for one endpoint) in sagemaker? Azure ML - 添加文件时出现只读错误 - Azure ML - Read only error when adding file 为什么我会收到 Sagemaker Endpoint 没有多个模型的错误? - Why do I get an error that Sagemaker Endpoint does not have multiple models when it does?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM