[英]Get consecutive occurrences of an event by group in pandas
I'm working with a DataFrame that has id
, wage
and date
, like this:我正在使用一个具有id
、 wage
和date
的 DataFrame ,如下所示:
id wage date
1 100 201212
1 100 201301
1 0 201302
1 0 201303
1 120 201304
1 0 201305
.
2 0 201302
2 0 201303
And I want to create a n_months_no_income
column that counts how many consecutive months a given individual has got wage==0
, like this:我想创建一个n_months_no_income
列来计算给定个人连续获得多少wage==0
,如下所示:
id wage date n_months_no_income
1 100 201212 0
1 100 201301 0
1 0 201302 1
1 0 201303 2
1 120 201304 0
1 0 201305 1
. .
2 0 201302 1
2 0 201303 2
I feel it's some sort of mix between groupby('id')
, cumcount()
, maybe diff()
or apply()
and then a fillna(0)
, but I'm not finding the right one.我觉得这是groupby('id')
, cumcount()
,也许是diff()
或apply()
之间的某种混合,然后是fillna(0)
,但我没有找到合适的。
Do you have any ideas?你有什么想法?
Here's an example for the dataframe for ease of replication:以下是数据框的示例,以便于复制:
df = pd.DataFrame({'id':[1,1,1,1,1,1,2,2],'wage':[100,100,0,0,120,0,0,0],
'date':[201212,201301,201302,201303,201304,201305,201302,201303]})
Edit: Added code for ease of use.编辑:添加了易于使用的代码。
In your case two groupby
with cumcount
and create the addtional key with cumsum
在您的情况下,使用cumcount
两个groupby
并使用cumcount
创建cumsum
密钥
df.groupby('id').wage.apply(lambda x : x.groupby(x.ne(0).cumsum()).cumcount())
Out[333]:
0 0
1 0
2 1
3 2
4 0
5 1
Name: wage, dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.