如何加載已經訓練好的XGBoost模型以在新數據集上運行？

Question

XGBoost的新手，請原諒我。 我已經在Boston房屋數據集中訓練了一個模型，並將其保存在本地。 現在，我想加載模型，並使用結構相似的新數據集來預測其標簽。 我將如何在Python 3.6中執行此操作？ 到目前為止，我從培訓步驟中獲得了以下信息：

更新為嘗試腌制替代

更新2：添加了錯誤原因，預處理。

更新3：請參見以下評論以獲取答案

    print('Splitting the features and label columns...')
    X, y = data.iloc[:,:-1],data.iloc[:,-1]

    print('Converting dataset to Dmatrix structure to use later on...')
    data_dmatrix = xgb.DMatrix(data=X,label=y)
    #....
    # Some more stuff here.
    #....
    print('Now, train the model...')
    grid = xgb.train(params=params, dtrain=data_dmatrix, num_boost_round=10)

    # Now, save the model for later use on unseen data
    import pickle
    model = pickle.dump(grid, open("pima.pickle.dat", "wb"))

    #.....after some time has passed

    # Now, load the model for use on a new dataset
    loaded_model = pickle.load(open("pima.pickle.dat", "rb"))
    print(loaded_model.feature_names)

    # Now, load a new dataset to run the model on and make predictions for
    dataset = pd.read_csv('Boston Housing Data.csv', skiprows=1))

    # Split the dataset into features and label
    # X = use all rows, up until the last column, which is the label or predicted column
    # y = use all rows in the last column of the dataframe ('Price')
    print('Splitting the new features and label column up for predictions...')
    X, y = dataset.iloc[:,:-1],dataset.iloc[:,-1]


    # Make predictions on labels of the test set
    preds = loaded_model.predict(X)

現在我得到了追溯：

        preds = loaded_model.predict(X)
    AttributeError: 'DataFrame' object has no attribute 'feature_names'

有任何想法嗎？ 我注意到當我打印loading_model.feature_names時，我得到：

['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT']

...但是實際的.csv文件有一個額外的列“ PRICE”，該列在培訓之前被添加並在培訓期間用作標簽。 這意味着什么嗎？

我認為我不必經歷整個訓練並測試拆分后的事情，因為我不想真正地重新訓練模型，只需在新數據集上使用它來進行預測，並從模型中顯示RMSE。新數據集上的實際值。 我在網上看到的所有教程都沒有涉及對新數據實施模型的步驟。 思考？ 謝謝！

Answer 1

您需要對測試集使用與訓練集相同的預處理，以便進行任何類型的預測。 您的問題是因為您在訓練中使用了DMatrix結構，這是BTW所必需的。

print('Converting dataset to Dmatrix structure to use later on...')
    data_dmatrix = xgb.DMatrix(data=X,label=y)

但未能在測試集上使用該預處理。 對所有訓練集，驗證集和測試集使用相同的預處理。 您的模型將是金色的。

如何加載已經訓練好的XGBoost模型以在新數據集上運行？

問題描述

1 個解決方案

解決方案1
1 已采納 2019-07-27 07:49:21

如何加載已經訓練好的XGBoost模型以在新數據集上運行？

問題描述

1 個解決方案

解決方案1 1 已采納 2019-07-27 07:49:21

解決方案1
1 已采納 2019-07-27 07:49:21