[英]In Pandas how to remove all subrows but keep one which has the highest value in a specific column in a multiIndex dataframe?
所以我有一個像這樣的數據框:
+---+-----+------------+------------+-------+
| | | something1 | something2 | score |
+---+-----+------------+------------+-------+
| 1 | 112 | 1.00 | 10.0 | 15 |
| | 116 | 0.76 | -2.00 | 14 |
| 8 | 112 | 0.76 | 0.05 | 55 |
| | 116 | 1.00 | 1.02 | 54 |
+---+-----+------------+------------+-------+
我想實現這一目標:
+---+-----+------------+------------+-------+
| | | something1 | something2 | score |
+---+-----+------------+------------+-------+
| 1 | 112 | 1.00 | 10.0 | 15 |
| 8 | 112 | 1.00 | 1.02 | 55 |
+---+-----+------------+------------+-------+
我想為每個具有最大得分值的第一個索引僅保留一行。
我嘗試了類似的方法,對df進行排序,然后在每個組中選擇第一行,但未按預期工作:
df = df.sort_values("score", ascending=False).groupby(level=[0, 1]).first()
謝謝!
您只需要按0級分組:
df.sort_values("score", ascending=False).groupby(level=0).first()
# something1 something2 score
#1.0 1.00 10.00 15
#8.0 0.76 0.05 55
要保留第二級索引,可以將其重置為列,並在以后將其設置回索引:
(df.sort_values("score", ascending=False)
.reset_index(level=1)
.groupby(level=0).first()
.set_index('level_1', append=True))
# something1 something2 score
# level_1
#1.0 112 1.00 10.00 15
#8.0 112 0.76 0.05 55
使用nlargest
的替代方法:
df.groupby(level=0, group_keys=False).apply(lambda g: g.nlargest(1, 'score'))
# something1 something2 score
#1.0 112 1.00 10.00 15
#8.0 112 0.76 0.05 55
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.