[英]How do I conditionally select rows in a Pandas data frame by
我有以下 Pandas dataframe (顯示前十行):
index_x time_x total_def_x index_y time_y total_def_y event_time
0 2 2005.25394 15.72761 3 2005.25667 8.66223 2005.254962
1 4 2005.25941 11.31783 5 2005.26215 2.79943 2005.260101
2 11 2005.27858 8.74810 12 2005.28131 8.50871 2005.279085
3 18 2005.29774 6.31637 19 2005.30048 10.0420 2005.297804
4 52 2005.39083 0.18209 53 2005.39357 4.42270 2005.393209
5 65 2005.42642 2.71002 66 2005.42916 2.61663 2005.428290
6 106 2005.53867 -0.86598 107 2005.54141 0.26263 2005.539240
7 173 2005.72211 7.91387 174 2005.72485 -4.00652 2005.724622
8 201 2005.79877 4.09495 202 2005.80151 8.35356 2005.800502
9 217 2005.84257 6.63870 218 2005.84531 -1.81069 2005.843362
...
我想做的是 select 時間( time_x
或time_y
)和相應的變形值( total_def_x
或total_def_y
),其中時間最接近event_time
並將值放在數據框中。 到目前為止,我為實現這一目標而編寫的代碼如下:
nearest_df = pd.DataFrame(columns=["time", "total_def"])
for et in new_df["event_time"]:
if abs(et - new_df["time_x"].values) < abs(et - new_df["time_y"].values):
nearest_df.append(new_df["time_x", "total_def_x"])
else:
nearest_df.append(new_df["time_y", "total_def_y"])
但是,我嘗試重寫它的每次嘗試都會返回此錯誤:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
當我像這樣修改代碼時if (abs(et - new_df['time_x'].values) < abs(et - new_df['time_y'].values)).all():
,我收到此錯誤:
KeyError: ('time_x', 'total_def_x')
預期的 output 的示例是這樣的數據幀(nearest_df),因為將選擇 time_x 和 time_y 與 event_time 的差異中較小的一個以及它們各自的變形(total_def_x 或 y):
time total_def
2005.25667 8.66223
2005.25941 11.31783
2005.27858 8.74810
對此的任何幫助將不勝感激。
你可以試試這個:
# Create temporary columns
df["dist_x"] = (df["event_time"] - df["time_x"]).abs()
df["dist_y"] = (df["event_time"] - df["time_y"]).abs()
# Select proper rows
df_x = df.loc[df["dist_x"] < df["dist_y"], ["time_x", "total_def_x"]]
df_y = df.loc[df["dist_x"] >= df["dist_y"], ["time_y", "total_def_y"]]
# Rename and append results
df_x.columns = df_y.columns = ["time", "total_def"]
new_df = pd.concat(objs=[df_x, df_y]).sort_index()
print(new_df)
# Outputs
time total_def
0 2005.25394 15.72761
1 2005.25941 11.31783
2 2005.27858 8.74810
3 2005.29774 6.31637
4 2005.39357 4.42270
5 2005.42916 2.61663
6 2005.53867 -0.86598
7 2005.72485 -4.00652
8 2005.80151 8.35356
9 2005.84257 6.63870
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.