python回歸：用新數據預測模型

Question

我正在嘗試使用新數據來預測新結果，但是，我正在處理以下錯誤：

ValueError: feature_names mismatch: ['time', 'x', 'y'] ['f0', 'f1', 'f2'] 輸入數據訓練數據中的預期 x、時間、y 沒有以下字段：f0 , f1, f2

我不明白為什么，因為我有 3 個預測變量，而且我在數組中正好使用了 3 個值。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import xgboost as xgb
import datetime
import seaborn as sns
from numpy import asarray

data=[[1, 1,2 ,5],
        [2, 5,5,6],
        [3, 4,6,6]
        ,[5, 6,5,6],
        [7,9,9,7],
        [8, 7,9,4]
        ,[9, 2,3,8],
        [2, 5,1,9],
        [2,2,10,9]
        ,[3, 8,2,8],
        [6, 5,4,10],
        [6, 8,5 ,10]]
df = pd.DataFrame(data, columns=['time','x','y','target'])
xgb_reg=xgb.XGBRegressor( n_estimators= 30, max_depth=8, eta= 0.1, colsample_bytree= 0.4, subsample= 0.4) #(n_estimators=250, max_depth=15, eta=0.1, subsample=0.4, colsample_bytree=0.4)
y = (df.target)
X=df.drop(['target'], axis = 1)
print('========1=============')
model=xgb_reg.fit(X,y)
prediction=model.predict(X)
new_data=[[10,10,10]]
new_data_asarray=asarray(new_data)
pred=model.predict(new_data_asarray)
print(pred)

Answer 1

這是因為您的模型需要一個 Pandas 數據框作為輸入。

如下所示，只需在訓練之前將 X 數據幀轉換為 numpy 數組即可。

import numpy as np
import pandas as pd
import xgboost as xgb


data = [
    [1, 1, 2, 5],
    [2, 5, 5, 6],
    [3, 4, 6, 6],
    [5, 6, 5, 6],
    [7, 9, 9, 7],
    [8, 7, 9, 4],
    [9, 2, 3, 8],
    [2, 5, 1, 9],
    [2, 2, 10, 9],
    [3, 8, 2, 8],
    [6, 5, 4, 10],
    [6, 8, 5, 10],
]
df = pd.DataFrame(data, columns=["time", "x", "y", "target"])
xgb_reg = xgb.XGBRegressor(
    n_estimators=30, max_depth=8, eta=0.1, colsample_bytree=0.4, subsample=0.4
)  # (n_estimators=250, max_depth=15, eta=0.1, subsample=0.4, colsample_bytree=0.4)
y = df.target
X = df.drop(["target"], axis=1)

X = X.to_numpy()

print("========1=============")
model = xgb_reg.fit(X, y)
prediction = model.predict(X)
new_data = [[10, 10, 10]]
new_data_asarray = np.asarray(new_data)
pred = model.predict(new_data_asarray)
print(pred)

Answer 2

xgb 期望用於訓練和測試的相同類型的數據。 由於您使用 Pandas 數據框進行訓練，但在預測中提供了一個 numpy 數組，因此會出現錯誤。 （此外，它嘗試從該數組中使用默認列名f*生成數據框，如錯誤所示）。

因此，解決方法是將預測中使用的數組轉換為列名取自訓練X數據幀的幀：

new_data = [[10,10,10]]
new_data_as_frame = pd.DataFrame(new_data, columns=X.columns)
pred = model.predict(new_data_as_frame)

Answer 3

當我將輸入作為具有指定列名的數據框提供時，它可以工作。

model = xgb_reg.fit(X, y)
prediction = model.predict(X)
new_data = [[10, 10, 10]]
new_data = pd.DataFrame(new_data, columns=['time', 'x', 'y'])
pred = model.predict(new_data)
print(pred)  # [6.3624153]

python回歸：用新數據預測模型

問題描述

3 個解決方案

解決方案1
2 2021-07-08 09:06:10

解決方案2
1 2021-07-08 09:03:27

解決方案3
1 2021-07-08 09:05:35

python回歸：用新數據預測模型

問題描述

3 個解決方案

解決方案1 2 2021-07-08 09:06:10

解決方案2 1 2021-07-08 09:03:27

解決方案3 1 2021-07-08 09:05:35

解決方案1
2 2021-07-08 09:06:10

解決方案2
1 2021-07-08 09:03:27

解決方案3
1 2021-07-08 09:05:35