簡體   English   中英

我將如何根據 Pandas 中的另一個行條件在 object 組中排名? 示例包括

[英]How would I rank within a groupby object based on another row condition in Pandas? Example included

下面的 dataframe 有 4 列:runner_name,race_date, height_in_inches,top_ten_finish。

我想按 race_date 分組,如果跑步者在該 race_date 中名列前十,則將他的 height_in_inches 排在該 race_date 中名列前十的其他跑步者中。 我該怎么做?

這是原來的 dataframe:

>>> import pandas as pd
>>> d = {"runner":['mike','paul','jim','dave','douglas'],
...     "race_date":['2019-02-02','2019-02-02','2020-02-02','2020-02-01','2020-02-01'],
...      "height_in_inches":[72,68,70,74,73],
...     "top_ten_finish":["yes","yes","no","yes","no"]}
>>> df = pd.DataFrame(d)
>>> df
    runner   race_date  height_in_inches top_ten_finish
0     mike  2019-02-02                72            yes
1     paul  2019-02-02                68            yes
2      jim  2020-02-02                70             no
3     dave  2020-02-01                74            yes
4  douglas  2020-02-01                73             no
>>> 

這就是我想要的結果。 請注意,如果他們沒有進入比賽的前 10 名,他們將如何獲得該新列的值 0。

    runner   race_date  height_in_inches top_ten_finish  if_top_ten_height_rank
0     mike  2019-02-02                72            yes                       1
1     paul  2019-02-02                68            yes                       2
2      jim  2020-02-02                70             no                       0
3     dave  2020-02-01                74            yes                       1
4  douglas  2020-02-01                73             no                       0

謝謝!

我們可以使用groupby + filter with rank

df['rank']=df[df.top_ten_finish.eq('yes')].groupby('race_date')['height_in_inches'].rank(ascending=False)
df['rank'].fillna(0,inplace=True)
df
Out[87]: 
    runner   race_date  height_in_inches top_ten_finish  rank
0     mike  2019-02-02                72            yes   1.0
1     paul  2019-02-02                68            yes   2.0
2      jim  2020-02-02                70             no   0.0
3     dave  2020-02-01                74            yes   1.0
4  douglas  2020-02-01                73             no   0.0

您可以對groupby()進行過濾和排名,然后分配回去:

df['if_top_ten_height_rank'] = (df.loc[df['top_ten_finish']=='yes','height_in_inches']
                                   .groupby(df['race_date']).rank(ascending=False)
                                   .reindex(df.index, fill_value=0)
                                   .astype(int)
                                )

Output:

    runner    race_date      height_in_inches  top_ten_finish      if_top_ten_height_rank
--  --------  -----------  ------------------  ----------------  ------------------------
 0  mike      2019-02-02                   72  yes                                      1
 1  paul      2019-02-02                   68  yes                                      2
 2  jim       2020-02-02                   70  no                                       0
 3  dave      2020-02-01                   74  yes                                      1
 4  douglas   2020-02-01                   73  no                                       0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM