简体   繁体   English

groupby,统计过去发生的事件,并显示最近的事件

[英]groupby, count past occurences of events, and show the most recent event

How can I group by a unique identifier and count the number of past delinquencies('Bad') and past non delinquencies ('Good') before the most recent event.如何按唯一标识符分组并计算最近事件之前的过去拖欠('Bad')和过去非拖欠('Good')的数量。

For example, given the following dataframe:例如,给定以下 dataframe:

ID    Date         Class    
112   2018-02-12    Good
112   2019-01-20    Bad
113   2018-10-11    Bad
113   2019-01-01    Good
113   2020-02-03    Good

This should be the end goal:这应该是最终目标:

ID    Past_deliq  Past_non_deliq  Class   Date
112      0           1             Bad    2019-01-20
113      1           1             Good   2020-02-03

I can get the most recent event by doing the following, df.loc[df.groupby('ID').Date.idxmax()] , but I cant find a way to count past occurrences.我可以通过执行以下操作来获取最新事件, df.loc[df.groupby('ID').Date.idxmax()] ,但我找不到计算过去事件的方法。

Any help is greatly appreciated.任何帮助是极大的赞赏。

Just some basic reshaping and crosstab .只是一些基本的重塑和crosstab

The idea is to filter your dataframe by values that aren't the max, do a values count aggregation and re-join your dataframe with the max dates.这个想法是通过不是最大值的值过滤您的 dataframe,进行值计数聚合并重新加入您的 dataframe 与最大日期。

max_date = df.groupby('ID')['Date'].max()
s1 = df.loc[~df.index.isin(df.groupby("ID")["Date"].idxmax())]

df1 = pd.crosstab(s1.ID, s1.Class).join(max_date).rename(
    columns={"Bad": "Past_deliq", "Good": "Past_non_deliq"}
)



     Past_deliq  Past_non_deliq       Date
ID                                        
112           0               1 2019-01-20
113           1               1 2020-02-03
b=df.groupby(["ID","Class"])["Class"].count().unstack()

You groupby both the ID and the Class which means you will get the count of each class for each ID.您将 ID 和 Class 分组,这意味着您将获得每个 ID 的每个 class 的计数。 Than you call unstack which takes the left most labels from the index and inserts them as columns.比你调用 unstack 从索引中获取最左边的标签并将它们作为列插入。

After you make another groupby by which you determine the last occurence(this solution has the assumption your data is ordered by date, if not use function max).在您确定最后一次出现的另一个 groupby 之后(此解决方案假设您的数据按日期排序,如果不使用 function max)。

c=df.groupby("ID").agg({"Date":"last","Class":"last"})

After you merger the two dataframes.合并两个数据框后。

b.merge(c, on="ID")

And you get what you requested.你得到你所要求的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM