简体   繁体   English

Sagemaker Model 部署错误,ClientError:调用 CreateModel 操作时发生错误(ValidationException)

[英]Sagemaker Model Deployment Error, ClientError: An error occurred (ValidationException) when calling the CreateModel operation

I am trying to deploy a model with AWS Sagemaker using SKlearn, and getting this error:我正在尝试使用 SKlearn 通过 AWS Sagemaker 部署 model,并收到此错误:

---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
<ipython-input-145-29a1d3175b01> in <module>
----> 1 deployment = model.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, use_compiled_model, wait, model_name, kms_key, data_capture_config, tags, serverless_inference_config, async_inference_config, **kwargs)
   1254             kms_key=kms_key,
   1255             data_capture_config=data_capture_config,
-> 1256             serverless_inference_config=serverless_inference_config,
   1257             async_inference_config=async_inference_config,
   1258         )

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/model.py in deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, tags, kms_key, wait, data_capture_config, async_inference_config, serverless_inference_config, **kwargs)
   1001                 self._base_name = "-".join((self._base_name, compiled_model_suffix))
   1002 
-> 1003         self._create_sagemaker_model(
   1004             instance_type, accelerator_type, tags, serverless_inference_config
   1005         )

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/model.py in _create_sagemaker_model(self, instance_type, accelerator_type, tags, serverless_inference_config)
    548             container_def,
    549             vpc_config=self.vpc_config,
--> 550             enable_network_isolation=enable_network_isolation,
    551             tags=tags,
    552         )

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in create_model(self, name, role, container_defs, vpc_config, enable_network_isolation, primary_container, tags)
   2670 
   2671         try:
-> 2672             self.sagemaker_client.create_model(**create_model_request)
   2673         except ClientError as e:
   2674             error_code = e.response["Error"]["Code"]

~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    413                     "%s() only accepts keyword arguments." % py_operation_name)
    414             # The "self" in this scope is referring to the BaseClient.
--> 415             return self._make_api_call(operation_name, kwargs)
    416 
    417         _api_call.__name__ = str(py_operation_name)

~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    743             error_code = parsed_response.get("Error", {}).get("Code")
    744             error_class = self.exceptions.from_code(error_code)
--> 745             raise error_class(parsed_response, operation_name)
    746         else:
    747             return parsed_response

ClientError: An error occurred (ValidationException) when calling the CreateModel operation: Could not find model data at s3://sagemaker-us-east-2-978433479050/sagemaker-scikit-learn-2022-04-28-22-33-14-817/output/model.tar.gz.

The code I am running is:我正在运行的代码是:

from sagemaker import Session, get_execution_role
from sagemaker.sklearn.estimator import SKLearn

sagemaker_session = Session()
role = get_execution_role()

train_input = sagemaker_session.upload_data("TSLA.csv")

model = SKLearn(entry_point='lr.py',
                      train_instance_type='ml.m4.xlarge',
                      role=role, framework_version='0.231',
                      sagemaker_session=sagemaker_session)

model.fit({'train': train_input})

deployment = model.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")

And train_input is: s3://sagemaker-us-east-2-978433479050/data/TSLA.csv而 train_input 是:s3://sagemaker-us-east-2-978433479050/data/TSLA.csv

The training job is completed, but for some reason the model is not deploying.培训工作已完成,但由于某种原因 model 未部署。

Please advise, thank you请指教,谢谢

The logs are indicating that your trained model artifact is not being captured properly.日志表明您训练的 model 工件没有被正确捕获。 Please run请跑

model.data #estimator that you are training

This will show if your model artifact/data was actually created (model.tar.gz).这将显示您的 model 工件/数据是否实际创建 (model.tar.gz)。

Here is an example of training/deploying a sklearn model: https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/RealTime/Script-Mode/Sklearn/Regression这是训练/部署 sklearn model 的示例: https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/RealTime/Script-Mode/Sklearn/Regression

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 botocore.exceptions.ClientError:调用 HeadObject 操作时发生错误 (403):在 AWS SageMaker 中使用本地模式时禁止 - botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden while using local mode in AWS SageMaker 使用自定义训练的 Keras model 和 Sagemaker 端点结果 ModelError:调用 InvokeEndpoint 操作时发生错误(ModelError): - Using custom trained Keras model with Sagemaker endpoint results ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: ClientError:调用HeadObject操作时发生错误(403):尝试上传视频时被禁止 - ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden when trying to upload video Collectstatic 失败 - botocore.exceptions.ClientError:调用 HeadObject 操作时发生错误 (404):未找到 - Collectstatic failing - botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found botocore.exceptions.ClientError调用GetObject操作时发生错误(SignatureDoesNotMatch) - botocore.exceptions.ClientError An error occurred (SignatureDoesNotMatch) when calling the GetObject operation ClientError:调用发布操作时发生错误(InternalFailure)(达到最大重试次数:4) - ClientError: An error occurred (InternalFailure) when calling the Publish operation (reached max retries: 4) AWS Lambda python boto3 dynamodb 表扫描 - 调用扫描操作时发生错误(ValidationException):ExpressionAttributeNames - AWS Lambda python boto3 dynamodb table scan - An error occurred (ValidationException) when calling the Scan operation: ExpressionAttributeNames botocore.exceptions.ClientError:调用GetItem操作时发生错误(InvalidSignatureException) - botocore.exceptions.ClientError: An error occurred (InvalidSignatureException) when calling the GetItem operation 尝试在 AWS Lambda Lambda function 中下载文件时如何修复“ClientError:调用 HeadObject 操作时发生错误(403):禁止” - how to fix "ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden" when trying to download file in AWS Lambda function 调用 DescribeLaunchTemplates 操作时发生错误(UnauthorizedOperation)? - An error occurred (UnauthorizedOperation) when calling the DescribeLaunchTemplates operation?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM