Sagemaker Model 部署错误，ClientError：调用 CreateModel 操作时发生错误（ValidationException）

Question

I am trying to deploy a model with AWS Sagemaker using SKlearn, and getting this error:我正在尝试使用 SKlearn 通过 AWS Sagemaker 部署 model，并收到此错误：

---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
<ipython-input-145-29a1d3175b01> in <module>
----> 1 deployment = model.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, use_compiled_model, wait, model_name, kms_key, data_capture_config, tags, serverless_inference_config, async_inference_config, **kwargs)
   1254             kms_key=kms_key,
   1255             data_capture_config=data_capture_config,
-> 1256             serverless_inference_config=serverless_inference_config,
   1257             async_inference_config=async_inference_config,
   1258         )

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/model.py in deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, tags, kms_key, wait, data_capture_config, async_inference_config, serverless_inference_config, **kwargs)
   1001                 self._base_name = "-".join((self._base_name, compiled_model_suffix))
   1002 
-> 1003         self._create_sagemaker_model(
   1004             instance_type, accelerator_type, tags, serverless_inference_config
   1005         )

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/model.py in _create_sagemaker_model(self, instance_type, accelerator_type, tags, serverless_inference_config)
    548             container_def,
    549             vpc_config=self.vpc_config,
--> 550             enable_network_isolation=enable_network_isolation,
    551             tags=tags,
    552         )

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in create_model(self, name, role, container_defs, vpc_config, enable_network_isolation, primary_container, tags)
   2670 
   2671         try:
-> 2672             self.sagemaker_client.create_model(**create_model_request)
   2673         except ClientError as e:
   2674             error_code = e.response["Error"]["Code"]

~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    413                     "%s() only accepts keyword arguments." % py_operation_name)
    414             # The "self" in this scope is referring to the BaseClient.
--> 415             return self._make_api_call(operation_name, kwargs)
    416 
    417         _api_call.__name__ = str(py_operation_name)

~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    743             error_code = parsed_response.get("Error", {}).get("Code")
    744             error_class = self.exceptions.from_code(error_code)
--> 745             raise error_class(parsed_response, operation_name)
    746         else:
    747             return parsed_response

ClientError: An error occurred (ValidationException) when calling the CreateModel operation: Could not find model data at s3://sagemaker-us-east-2-978433479050/sagemaker-scikit-learn-2022-04-28-22-33-14-817/output/model.tar.gz.

The code I am running is:我正在运行的代码是：

from sagemaker import Session, get_execution_role
from sagemaker.sklearn.estimator import SKLearn

sagemaker_session = Session()
role = get_execution_role()

train_input = sagemaker_session.upload_data("TSLA.csv")

model = SKLearn(entry_point='lr.py',
                      train_instance_type='ml.m4.xlarge',
                      role=role, framework_version='0.231',
                      sagemaker_session=sagemaker_session)

model.fit({'train': train_input})

deployment = model.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")

And train_input is: s3://sagemaker-us-east-2-978433479050/data/TSLA.csv而 train_input 是：s3://sagemaker-us-east-2-978433479050/data/TSLA.csv

The training job is completed, but for some reason the model is not deploying.培训工作已完成，但由于某种原因 model 未部署。

Please advise, thank you请指教，谢谢

Answer 1

The logs are indicating that your trained model artifact is not being captured properly.日志表明您训练的 model 工件没有被正确捕获。 Please run请跑

model.data #estimator that you are training

This will show if your model artifact/data was actually created (model.tar.gz).这将显示您的 model 工件/数据是否实际创建 (model.tar.gz)。

Here is an example of training/deploying a sklearn model: https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/RealTime/Script-Mode/Sklearn/Regression这是训练/部署 sklearn model 的示例： https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/RealTime/Script-Mode/Sklearn/Regression

Sagemaker Model 部署错误，ClientError：调用 CreateModel 操作时发生错误（ValidationException）

问题描述

1 个解决方案

解决方案1
0 2022-04-29 00:03:46

Sagemaker Model 部署错误，ClientError：调用 CreateModel 操作时发生错误（ValidationException）

问题描述

1 个解决方案

解决方案1 0 2022-04-29 00:03:46

解决方案1
0 2022-04-29 00:03:46