简体   繁体   English

在 Pandas 中按组获取连续发生的事件

[英]Get consecutive occurrences of an event by group in pandas

I'm working with a DataFrame that has id , wage and date , like this:我正在使用一个具有idwagedate的 DataFrame ,如下所示:

id   wage   date
1    100    201212
1    100    201301             
1     0     201302
1     0     201303
1    120    201304
1     0     201305
      .
2     0     201302
2     0     201303

And I want to create a n_months_no_income column that counts how many consecutive months a given individual has got wage==0 , like this:我想创建一个n_months_no_income列来计算给定个人连续获得多少wage==0 ,如下所示:

id   wage   date     n_months_no_income
1    100    201212             0
1    100    201301             0
1     0     201302             1
1     0     201303             2
1    120    201304             0
1     0     201305             1
      .                        .
2     0     201302             1
2     0     201303             2

I feel it's some sort of mix between groupby('id') , cumcount() , maybe diff() or apply() and then a fillna(0) , but I'm not finding the right one.我觉得这是groupby('id')cumcount() ,也许是diff()apply()之间的某种混合,然后是fillna(0) ,但我没有找到合适的。

Do you have any ideas?你有什么想法?

Here's an example for the dataframe for ease of replication:以下是数据框的示例,以便于复制:

df = pd.DataFrame({'id':[1,1,1,1,1,1,2,2],'wage':[100,100,0,0,120,0,0,0],
 'date':[201212,201301,201302,201303,201304,201305,201302,201303]})

Edit: Added code for ease of use.编辑:添加了易于使用的代码。

In your case two groupby with cumcount and create the addtional key with cumsum在您的情况下,使用cumcount两个groupby并使用cumcount创建cumsum密钥

df.groupby('id').wage.apply(lambda x : x.groupby(x.ne(0).cumsum()).cumcount())
Out[333]: 
0    0
1    0
2    1
3    2
4    0
5    1
Name: wage, dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM