Groupby最大值並在pandas數據框中返回對應的行

Question

我的數據框包括學生，日期和考試成績。 我想找到每個學生的最長時間，然后返回相應的行（最后，我對學生的最新成績最感興趣）。 我如何在熊貓中做到這一點？

假設我的數據框看起來像這樣（縮寫版本）：

Student_id  Date     Score
Tina1       1/17/17   .95
John2       1/18/17   .8
Lia1        12/13/16  .845
John2       1/25/17   .975
Tina1       1/1/17    .78
Lia1        6/12/16   .89

這就是我要的：

Student_id  Date     Score
Tina1       1/17/17   .95
Lia1        12/13/16  .845
John2       1/25/17   .975

我在SO上找到了它，但它給了我一個位置索引器出界錯誤。

df.iloc[df.groupby('student_id').apply(lambda x: x['date'].idxmax())]

還有什么其他方法可以實現同一目標？

Answer 1

您可以按日期對數據框進行排序，然后使用groupby.tail來獲取最新記錄：

df.iloc[pd.to_datetime(df.Date, format='%m/%d/%y').argsort()].groupby('Student_id').tail(1)

#Student_id     Date    Score
#2     Lia1 12/13/16    0.845
#0    Tina1  1/17/17    0.950
#3    John2  1/25/17    0.975

或避免排序，請使用idxmax （如果沒有重復的索引，則可以使用此功能）：

df.loc[pd.to_datetime(df.Date, format='%m/%d/%y').groupby(df.Student_id).idxmax()]

# Student_id       Date Score
#3     John2    1/25/17 0.975
#2      Lia1   12/13/16 0.845
#0     Tina1    1/17/17 0.950

Groupby最大值並在pandas數據框中返回對應的行

問題描述

1 個解決方案

解決方案1
2 已采納 2017-07-07 17:17:28

Groupby最大值並在pandas數據框中返回對應的行

問題描述

1 個解決方案

解決方案1 2 已采納 2017-07-07 17:17:28

解決方案1
2 已采納 2017-07-07 17:17:28