[英]How to fill pandas dataframe columns in for loop
我正在嘗試在for循環中填充pandas
dataframe列。 列名是參數性的,並由循環值分配。 這是我的代碼:
for k in range (-1, -4, -1):
df_orj = pd.read_csv('something.csv', sep= '\t')
df_train = df_orj.head(11900)
df_test = df_orj.tail(720)
SHIFT = k
df_train.trend = df_train.trend.shift(SHIFT)
df_train = df_train.dropna()
df_test.trend = df_test.trend.shift(SHIFT)
df_test = df_test.dropna()
drop_list = some_list
df_out = df_test[['date','price']]
df_out.index = np.arange(0, len(df_out)) # start index from 0
df_out["pred-1"] = np.nan
df_out["pred-2"] = np.nan
df_out["pred-3"] = np.nan
df_train.drop(drop_list, 1, inplace = True )
df_test.drop(drop_list, 1, inplace = True )
# some processes here
rf = RandomForestClassifier(n_estimators = 10)
rf.fit(X_train,y_train)
y_pred = rf.predict(X_test)
print("accuracy score: " , rf.score(X_test, y_test))
X_test2 = sc.transform(df_test.drop('trend', axis=1))
y_test2 = df_test['trend'].values
y_pred2 = rf.predict(X_test2)
print("accuracy score: ",rf.score(X_test2, y_test2))
name = "pred{0}".format(k)
for i in range (0, y_test2.size):
df_out[name][i] = y_pred2[i]
df_out.head(20)
這是我的輸出:
time_period_start price_open pred-1 pred-2 pred-3
697 2018-10-02T02:00:00.0000000Z 86.80 NaN NaN 1.0
698 2018-10-02T03:00:00.0000000Z 86.65 NaN NaN 1.0
699 2018-10-02T04:00:00.0000000Z 86.32 NaN NaN 1.0
如您所見,僅pred-3
被填充。 如何填充所有3個預定義列?
如果我正確理解的話,那么您的問題是,您只能在其他兩個都是nan的情況下填充pred-3。 這是因為您的df_out處於循環中,並且您正在獲得循環的上一次迭代的結果。 您應該在循環之外定義它,以便您的信息不會丟失給其他兩個。
您在每個循環中將這三列設置為空,因此在迭代時會丟失這些值。 可以將那些初始化列移至循環之前,也可以使用以下變量進行初始化:
變出
df_out["pred-1"] = np.nan
df_out["pred-2"] = np.nan
df_out["pred-3"] = np.nan
在循環時僅初始化單個列
name = "pred{0}".format(k)
df_out[name] = np.nan
如此完整的代碼:
for k in range (-1, -4, -1):
df_orj = pd.read_csv('something.csv', sep= '\t')
df_train = df_orj.head(11900)
df_test = df_orj.tail(720)
SHIFT = k
df_train.trend = df_train.trend.shift(SHIFT)
df_train = df_train.dropna()
df_test.trend = df_test.trend.shift(SHIFT)
df_test = df_test.dropna()
drop_list = some_list
df_out = df_test[['date','price']]
df_out.index = np.arange(0, len(df_out)) # start index from 0
name = "pred{0}".format(k)
df_out[name] = np.nan
df_train.drop(drop_list, 1, inplace = True )
df_test.drop(drop_list, 1, inplace = True )
# some processes here
rf = RandomForestClassifier(n_estimators = 10)
rf.fit(X_train,y_train)
y_pred = rf.predict(X_test)
print("accuracy score: " , rf.score(X_test, y_test))
X_test2 = sc.transform(df_test.drop('trend', axis=1))
y_test2 = df_test['trend'].values
y_pred2 = rf.predict(X_test2)
print("accuracy score: ",rf.score(X_test2, y_test2))
for i in range (0, y_test2.size):
df_out[name][i] = y_pred2[i]
df_out.head(20)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.