Sagemaker Model 部署錯誤，ClientError：調用 CreateModel 操作時發生錯誤（ValidationException）

Question

我正在嘗試使用 SKlearn 通過 AWS Sagemaker 部署 model，並收到此錯誤：

---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
<ipython-input-145-29a1d3175b01> in <module>
----> 1 deployment = model.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, use_compiled_model, wait, model_name, kms_key, data_capture_config, tags, serverless_inference_config, async_inference_config, **kwargs)
   1254             kms_key=kms_key,
   1255             data_capture_config=data_capture_config,
-> 1256             serverless_inference_config=serverless_inference_config,
   1257             async_inference_config=async_inference_config,
   1258         )

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/model.py in deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, tags, kms_key, wait, data_capture_config, async_inference_config, serverless_inference_config, **kwargs)
   1001                 self._base_name = "-".join((self._base_name, compiled_model_suffix))
   1002 
-> 1003         self._create_sagemaker_model(
   1004             instance_type, accelerator_type, tags, serverless_inference_config
   1005         )

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/model.py in _create_sagemaker_model(self, instance_type, accelerator_type, tags, serverless_inference_config)
    548             container_def,
    549             vpc_config=self.vpc_config,
--> 550             enable_network_isolation=enable_network_isolation,
    551             tags=tags,
    552         )

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in create_model(self, name, role, container_defs, vpc_config, enable_network_isolation, primary_container, tags)
   2670 
   2671         try:
-> 2672             self.sagemaker_client.create_model(**create_model_request)
   2673         except ClientError as e:
   2674             error_code = e.response["Error"]["Code"]

~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    413                     "%s() only accepts keyword arguments." % py_operation_name)
    414             # The "self" in this scope is referring to the BaseClient.
--> 415             return self._make_api_call(operation_name, kwargs)
    416 
    417         _api_call.__name__ = str(py_operation_name)

~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    743             error_code = parsed_response.get("Error", {}).get("Code")
    744             error_class = self.exceptions.from_code(error_code)
--> 745             raise error_class(parsed_response, operation_name)
    746         else:
    747             return parsed_response

ClientError: An error occurred (ValidationException) when calling the CreateModel operation: Could not find model data at s3://sagemaker-us-east-2-978433479050/sagemaker-scikit-learn-2022-04-28-22-33-14-817/output/model.tar.gz.

我正在運行的代碼是：

from sagemaker import Session, get_execution_role
from sagemaker.sklearn.estimator import SKLearn

sagemaker_session = Session()
role = get_execution_role()

train_input = sagemaker_session.upload_data("TSLA.csv")

model = SKLearn(entry_point='lr.py',
                      train_instance_type='ml.m4.xlarge',
                      role=role, framework_version='0.231',
                      sagemaker_session=sagemaker_session)

model.fit({'train': train_input})

deployment = model.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")

而 train_input 是：s3://sagemaker-us-east-2-978433479050/data/TSLA.csv

培訓工作已完成，但由於某種原因 model 未部署。

請指教，謝謝

Answer 1

日志表明您訓練的 model 工件沒有被正確捕獲。 請跑

model.data #estimator that you are training

這將顯示您的 model 工件/數據是否實際創建 (model.tar.gz)。

這是訓練/部署 sklearn model 的示例： https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/RealTime/Script-Mode/Sklearn/Regression

Sagemaker Model 部署錯誤，ClientError：調用 CreateModel 操作時發生錯誤（ValidationException）

問題描述

1 個解決方案

解決方案1
0 2022-04-29 00:03:46

Sagemaker Model 部署錯誤，ClientError：調用 CreateModel 操作時發生錯誤（ValidationException）

問題描述

1 個解決方案

解決方案1 0 2022-04-29 00:03:46

解決方案1
0 2022-04-29 00:03:46