[英]How would I rank within a groupby object based on another row condition in Pandas? Example included
下面的 dataframe 有 4 列:runner_name,race_date, height_in_inches,top_ten_finish。
我想按 race_date 分組,如果跑步者在該 race_date 中名列前十,則將他的 height_in_inches 排在該 race_date 中名列前十的其他跑步者中。 我該怎么做?
這是原來的 dataframe:
>>> import pandas as pd >>> d = {"runner":['mike','paul','jim','dave','douglas'], ... "race_date":['2019-02-02','2019-02-02','2020-02-02','2020-02-01','2020-02-01'], ... "height_in_inches":[72,68,70,74,73], ... "top_ten_finish":["yes","yes","no","yes","no"]} >>> df = pd.DataFrame(d) >>> df runner race_date height_in_inches top_ten_finish 0 mike 2019-02-02 72 yes 1 paul 2019-02-02 68 yes 2 jim 2020-02-02 70 no 3 dave 2020-02-01 74 yes 4 douglas 2020-02-01 73 no >>>
這就是我想要的結果。 請注意,如果他們沒有進入比賽的前 10 名,他們將如何獲得該新列的值 0。
runner race_date height_in_inches top_ten_finish if_top_ten_height_rank 0 mike 2019-02-02 72 yes 1 1 paul 2019-02-02 68 yes 2 2 jim 2020-02-02 70 no 0 3 dave 2020-02-01 74 yes 1 4 douglas 2020-02-01 73 no 0
謝謝!
我們可以使用groupby
+ filter with rank
df['rank']=df[df.top_ten_finish.eq('yes')].groupby('race_date')['height_in_inches'].rank(ascending=False)
df['rank'].fillna(0,inplace=True)
df
Out[87]:
runner race_date height_in_inches top_ten_finish rank
0 mike 2019-02-02 72 yes 1.0
1 paul 2019-02-02 68 yes 2.0
2 jim 2020-02-02 70 no 0.0
3 dave 2020-02-01 74 yes 1.0
4 douglas 2020-02-01 73 no 0.0
您可以對groupby()
進行過濾和排名,然后分配回去:
df['if_top_ten_height_rank'] = (df.loc[df['top_ten_finish']=='yes','height_in_inches']
.groupby(df['race_date']).rank(ascending=False)
.reindex(df.index, fill_value=0)
.astype(int)
)
Output:
runner race_date height_in_inches top_ten_finish if_top_ten_height_rank
-- -------- ----------- ------------------ ---------------- ------------------------
0 mike 2019-02-02 72 yes 1
1 paul 2019-02-02 68 yes 2
2 jim 2020-02-02 70 no 0
3 dave 2020-02-01 74 yes 1
4 douglas 2020-02-01 73 no 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.