I have a question regarding groupby, but I want to groupby period of time in such time period and compute the size of "item" (1month, 2month, 3month).
For example, the data shown below:
group time item
1 9/30/2014 a
1 10/30/2014 a
1 11/30/2014 b
2 9/30/2014 c
2 10/30/2014 d
2 11/30/2014 d
I would like to use the groupby as the time goes to sum the size of the item
group time item want
1 9/30/2014 a 1 (because we only have "a" in 9/30/2014 )
1 10/30/2014 a 1 (because we only have "a" from 9/30/2014 to 10/30/2014)
1 11/30/2014 b 2 (because we have "a" and "b" from 9/30/2014 to 11/30/2014)
2 9/30/2014 c 1
2 10/30/2014 d 2
2 11/30/2014 d 2
I appreciate your help. Thank you very much.
You can perform a groupby
+ expanding
with a nunique
count.
You need to cheat a bit as expanding
currently only supports numerical values. So I factorized
the data first:
df['want'] = (
pd.Series(df['item'].factorize()[0], index=df.index)
.groupby(df['group'])
.expanding()
.apply(lambda s: s.nunique())
.droplevel(0)
.astype(int)
)
Output:
group time item want
0 a 9/30/2014 a 1
1 a 10/30/2014 a 1
2 a 11/30/2014 b 2
3 b 9/30/2014 c 1
4 b 10/30/2014 d 2
5 b 11/30/2014 d 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.