簡體   English   中英

我如何在 Pandas 數據框中有條件地 select 行

[英]How do I conditionally select rows in a Pandas data frame by

我有以下 Pandas dataframe (顯示前十行):

index_x time_x  total_def_x index_y time_y  total_def_y event_time
0   2   2005.25394  15.72761    3   2005.25667  8.66223 2005.254962
1   4   2005.25941  11.31783    5   2005.26215  2.79943 2005.260101
2   11  2005.27858  8.74810    12   2005.28131  8.50871 2005.279085
3   18  2005.29774  6.31637    19   2005.30048  10.0420 2005.297804
4   52  2005.39083  0.18209    53   2005.39357  4.42270 2005.393209
5   65  2005.42642  2.71002    66   2005.42916  2.61663 2005.428290
6   106 2005.53867 -0.86598   107   2005.54141  0.26263 2005.539240
7   173 2005.72211  7.91387   174   2005.72485 -4.00652 2005.724622
8   201 2005.79877  4.09495   202   2005.80151  8.35356 2005.800502
9   217 2005.84257  6.63870   218   2005.84531 -1.81069 2005.843362
...

我想做的是 select 時間( time_xtime_y )和相應的變形值( total_def_xtotal_def_y ),其中時間最接近event_time並將值放在數據框中。 到目前為止,我為實現這一目標而編寫的代碼如下:

nearest_df = pd.DataFrame(columns=["time", "total_def"])

for et in new_df["event_time"]:

    if abs(et - new_df["time_x"].values) < abs(et - new_df["time_y"].values):

        nearest_df.append(new_df["time_x", "total_def_x"])

    else:
        nearest_df.append(new_df["time_y", "total_def_y"])

但是,我嘗試重寫它的每次嘗試都會返回此錯誤:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

當我像這樣修改代碼時if (abs(et - new_df['time_x'].values) < abs(et - new_df['time_y'].values)).all(): ,我收到此錯誤:

KeyError: ('time_x', 'total_def_x')

預期的 output 的示例是這樣的數據幀(nearest_df),因為將選擇 time_x 和 time_y 與 event_time 的差異中較小的一個以及它們各自的變形(total_def_x 或 y):

time        total_def
2005.25667  8.66223
2005.25941  11.31783
2005.27858  8.74810

對此的任何幫助將不勝感激。

你可以試試這個:

# Create temporary columns
df["dist_x"] = (df["event_time"] - df["time_x"]).abs()
df["dist_y"] = (df["event_time"] - df["time_y"]).abs()

# Select proper rows
df_x = df.loc[df["dist_x"] < df["dist_y"], ["time_x", "total_def_x"]]
df_y = df.loc[df["dist_x"] >= df["dist_y"], ["time_y", "total_def_y"]]

# Rename and append results
df_x.columns = df_y.columns = ["time", "total_def"]
new_df = pd.concat(objs=[df_x, df_y]).sort_index()

print(new_df)
# Outputs
         time  total_def
0  2005.25394   15.72761
1  2005.25941   11.31783
2  2005.27858    8.74810
3  2005.29774    6.31637
4  2005.39357    4.42270
5  2005.42916    2.61663
6  2005.53867   -0.86598
7  2005.72485   -4.00652
8  2005.80151    8.35356
9  2005.84257    6.63870

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM