[英]Pandas: count cumulated unique string values per group
I have a question regarding groupby, but I want to groupby period of time in such time period and compute the size of "item" (1month, 2month, 3month).我有一个关于 groupby 的问题,但我想在这样的时间段内对时间段进行分组并计算“项目”的大小(1 个月、2 个月、3 个月)。
For example, the data shown below:例如下图所示的数据:
group time item
1 9/30/2014 a
1 10/30/2014 a
1 11/30/2014 b
2 9/30/2014 c
2 10/30/2014 d
2 11/30/2014 d
I would like to use the groupby as the time goes to sum the size of the item随着时间的推移,我想使用 groupby 来总结项目的大小
group time item want
1 9/30/2014 a 1 (because we only have "a" in 9/30/2014 )
1 10/30/2014 a 1 (because we only have "a" from 9/30/2014 to 10/30/2014)
1 11/30/2014 b 2 (because we have "a" and "b" from 9/30/2014 to 11/30/2014)
2 9/30/2014 c 1
2 10/30/2014 d 2
2 11/30/2014 d 2
I appreciate your help.我感谢您的帮助。 Thank you very much.
非常感谢。
You can perform a groupby
+ expanding
with a nunique
count.您可以使用
nunique
计数执行groupby
+ expanding
。
You need to cheat a bit as expanding
currently only supports numerical values.您需要作弊,因为当前
expanding
仅支持数值。 So I factorized
the data first:所以我先
factorized
数据:
df['want'] = (
pd.Series(df['item'].factorize()[0], index=df.index)
.groupby(df['group'])
.expanding()
.apply(lambda s: s.nunique())
.droplevel(0)
.astype(int)
)
Output: Output:
group time item want
0 a 9/30/2014 a 1
1 a 10/30/2014 a 1
2 a 11/30/2014 b 2
3 b 9/30/2014 c 1
4 b 10/30/2014 d 2
5 b 11/30/2014 d 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.