如何在模型創建后調用 Sagemaker XGBoost 端點？

Question

我一直在關注這個非常有用的 XGBoost 教程（用於文章底部的代碼）： https ://medium.com/analytics-vidhya/random-forest-and-xgboost-on-amazon-sagemaker-and- aws-lambda-29abd9467795 。

迄今為止，我已經能夠為 ML 目的獲取適當格式化的數據、基於訓練數據創建的模型，然后通過模型提供測試數據以提供有用的結果。

然而，每當我離開並回來繼續研究模型或輸入新的測試數據時，我發現我需要重新運行所有模型創建步驟，以便進行任何進一步的預測。 相反，我只想根據 Image_URI 調用我已經創建的模型端點並輸入新數據。

當前執行的步驟：

模型訓練

xgb = sagemaker.estimator.Estimator(containers[my_region],
                                    role, 
                                    train_instance_count=1, 
                                    train_instance_type='ml.m4.xlarge',
                                    output_path='s3://{}/{}/output'.format(bucket_name, prefix),
                                    sagemaker_session=sess)
xgb.set_hyperparameters(eta=0.06,
                        alpha=0.8,
                        lambda_bias=0.8,
                        gamma=50,
                        min_child_weight=6,
                        subsample=0.5,
                        silent=0,
                        early_stopping_rounds=5,
                        objective='reg:linear',
                        num_round=1000)

xgb.fit({'train': s3_input_train})

xgb_predictor = xgb.deploy(initial_instance_count=1,instance_type='ml.m4.xlarge')

評估

test_data_array = test_data.drop([ 'price','id','sqft_above','date'], axis=1).values #load the data into an array

xgb_predictor.serializer = csv_serializer # set the serializer type

predictions = xgb_predictor.predict(test_data_array).decode('utf-8') # predict!
predictions_array = np.fromstring(predictions[1:], sep=',') # and turn the prediction into an array
print(predictions_array.shape)

from sklearn.metrics import r2_score
print("R2 score : %.2f" % r2_score(test_data['price'],predictions_array))

似乎這一行：

predictions = xgb_predictor.predict(test_data_array).decode('utf-8') # predict!

需要重寫以便不引用 xgb.predictor 而是引用模型位置。

我已經嘗試了以下

trained_model = sagemaker.model.Model(
    model_data='s3://{}/{}/output/xgboost-2020-11-10-00-00/output/model.tar.gz'.format(bucket_name, prefix),
    image_uri='XXXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/xgboost:latest',
    role=role)  # your role here; could be different name

trained_model.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

然后更換

xgb_predictor.serializer = csv_serializer # set the serializer type
predictions = xgb_predictor.predict(test_data_array).decode('utf-8') # predict!

和

trained_model.serializer = csv_serializer # set the serializer type
predictions = trained_model.predict(test_data_array).decode('utf-8') # predict!

但我收到以下錯誤：

AttributeError: 'Model' object has no attribute 'predict'

Answer 1

這是一個很好的問題 :) 我同意，許多官方教程傾向於展示完整的訓練到調用管道，並且沒有足夠強調每個步驟可以單獨完成。 在您的特定情況下，當您想要調用已部署的端點時，您可以：(A) 在眾多 SDK 之一中使用 invoke API 調用（例如在CLI 中， boto3 ）或（B）或實例化一個predictor高級 Python SDK，通用sagemaker.model.Model類或其特定於 XGBoost 的子類： sagemaker.xgboost.model.XGBoostPredictor ，如下所示：

from sagemaker.xgboost.model import XGBoostPredictor
    
predictor = XGBoostPredictor(endpoint_name='your-endpoint')
predictor.predict('<payload>')

類似的問題如何使用來自 s3 的預訓練模型來預測一些數據？

筆記：

如果您希望model.deploy()調用返回預測器，則必須使用predictor_cls實例化您的模型。 這是可選的，您也可以先部署一個模型，然后使用上述技術作為單獨的步驟調用它
即使您不調用端點，端點也會產生費用； 他們按正常運行時間收費。 因此，如果您不需要始終在線的端點，請不要猶豫將其關閉以最大程度地降低成本。

如何在模型創建后調用 Sagemaker XGBoost 端點？

問題描述

1 個解決方案

解決方案1
1 已采納 2020-11-11 15:26:02

如何在模型創建后調用 Sagemaker XGBoost 端點？

問題描述

1 個解決方案

解決方案1 1 已采納 2020-11-11 15:26:02

解決方案1
1 已采納 2020-11-11 15:26:02