Scikitlearn - 擬合和預測輸入的順序，是否重要？

Question

剛開始使用這個庫...使用RandomForestClassifiers有一些問題（我已經閱讀過文檔，但沒有弄清楚）

我的問題非常簡單，比方說我有一個火車數據集

ABC

1 2 3

其中A是自變量（y），BC是因變量（x）。 假設測試集看起來相同，但順序是

BAC

1 2 3

當我調用forest.fit(train_data[0:,1:],train_data[0:,0])然后我需要在運行之前重新排序測試集以匹配此順序嗎？ （忽略我需要刪除已經預測的y值（a）的事實，所以讓我們說B和C亂序......）

Answer 1

是的，你需要重新排序它們。 想象一個更簡單的案例，線性回歸。 該算法將計算每個特征的權重，因此，例如，如果特征1不重要，則將為其分配接近0權重。

如果在預測時間順序不同，則一個重要特征將乘以這幾乎為零的權重，並且預測將完全關閉。

Answer 2

elyase是正確的。 scikit-learn將以您給出的任何順序簡單地獲取數據。 因此，您必須確保在訓練和預測時間內數據的順序相同。

這是一個簡單的說明示例：

訓練時間：

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()
x = pd.DataFrame({
    'feature_1': [0, 0, 1, 1],
    'feature_2': [0, 1, 0, 1]
})
y = [0, 0, 1, 1]
model.fit(x, y) 
# we now have a model that 
# (i)  predicts 0 when x = [0, 0] or [0, 1], and 
# (ii) predicts 1 when x = [1, 0] or [1, 1]

預測時間：

# positive example
http_request_payload = {
    'feature_1': 0,
    'feature_2': 1
}
input_features = pd.DataFrame([http_request_payload])
model.predict(input_features) # this returns 0, as expected


# negative example
http_request_payload = {
    'feature_2': 1,    # notice that the order is jumbled up
    'feature_1': 0
}
input_features = pd.DataFrame([http_request_payload])
model.predict(input_features) # this returns 1, when it should have returned 0. 
# scikit-learn doesn't care about the key-value mapping of the features. 
# it simply vectorizes the dataframe in whatever order it comes in.

這是我在訓練期間緩存列順序的方式，以便我可以在預測時間內使用它。

# training
x = pd.DataFrame([...])
column_order = x.columns
model = SomeModel().fit(x, y) # train model

# save the things that we need at prediction time. you can also use pickle if you don't want to pip install joblib
import joblib  

joblib.dump(model, 'my_model.joblib') 
joblib.dump(column_order, 'column_order.txt') 

# load the artifacts from disk
model = joblib.load('linear_model.joblib') 
column_order = joblib.load('column_order.txt') 

# imaginary http request payload
request_payload = { 'feature_1': ..., 'feature_1': ... }

# create empty dataframe with the right shape and order (using column_order)
input_features = pd.DataFrame([], columns=column_order)
input_features = input_features.append(request_payload, ignore_index=True)
input_features = input_features.fillna(0) # handle any missing data however you like

model.predict(input_features.values.tolist())

Scikitlearn - 擬合和預測輸入的順序，是否重要？

問題描述

2 個解決方案

解決方案1
4 已采納 2015-02-02 22:12:08

解決方案2
0 2019-01-31 06:48:09

Scikitlearn - 擬合和預測輸入的順序，是否重要？

問題描述

2 個解決方案

解決方案1 4 已采納 2015-02-02 22:12:08

解決方案2 0 2019-01-31 06:48:09

解決方案1
4 已采納 2015-02-02 22:12:08

解決方案2
0 2019-01-31 06:48:09