[英]How to drop duplicates based on two or more subsets criteria in Pandas data-frame
[英]How to drop duplicates for the following data-frame based on multiple conditions?
我有一個數據框如下:
df
周期 | 秒 | 距離 | 水平 |
---|---|---|---|
1 | -40.1 | 9.87 | 2.7 |
1 | -40.1 | 9.89 | 2.2 |
2 | -39.1 | 14.07 | 2.0 |
2 | -39.1 | 14.09 | 2.8 |
3 | -38.7 | 18.09 | 3.2 |
4 | -36.6 | 15.37 | 0.5 |
4 | -38.01 | 16.23 | 1.8 |
4 | -38.4 | 16.66 | 3.1 |
i have to drop the duplicate cycle based on some conditions:
-if the csec is same then
-look for the dist and keep the row with highest dist
-if dist are same check the vel ,keep the row with highest vel
-if csec is different
-keep the row with highest csec
output
周期 | 秒 | 距離 | 水平 |
---|---|---|---|
1 | -40.1 | 9.89 | 2.2 |
2 | -39.1 | 14.09 | 2.8 |
3 | -38.7 | 18.09 | 3.2 |
4 | -36.6 | 15.37 | 0.5 |
我能夠使用以下代碼獲得重復的行
duplicate_cyle = df[df.duplicated('cycle',keep = False)]
我想知道如何根據條件刪除行。
按 csec、dist 和 vel 的降序對數據幀進行排序,然后刪除重復項,例如:
out = (
df.sort_values(['cycle', 'csec', 'dist', 'vel'], ascending=[True, False, False, False])
.drop_duplicates(subset=['cycle'])
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.