简体   繁体   English

Pandas:计算每组累积的唯一字符串值

[英]Pandas: count cumulated unique string values per group

I have a question regarding groupby, but I want to groupby period of time in such time period and compute the size of "item" (1month, 2month, 3month).我有一个关于 groupby 的问题,但我想在这样的时间段内对时间段进行分组并计算“项目”的大小(1 个月、2 个月、3 个月)。

For example, the data shown below:例如下图所示的数据:

group    time      item
1      9/30/2014      a
1      10/30/2014     a
1      11/30/2014     b
2      9/30/2014      c
2      10/30/2014     d
2      11/30/2014     d

I would like to use the groupby as the time goes to sum the size of the item随着时间的推移,我想使用 groupby 来总结项目的大小

group    time      item   want
1      9/30/2014      a     1 (because we only have "a" in 9/30/2014 )
1      10/30/2014     a     1 (because we only have "a" from 9/30/2014 to 10/30/2014)
1      11/30/2014     b     2 (because we have "a" and "b" from 9/30/2014 to 11/30/2014)
2      9/30/2014      c     1  
2      10/30/2014     d     2
2      11/30/2014     d     2

I appreciate your help.我感谢您的帮助。 Thank you very much.非常感谢。

You can perform a groupby + expanding with a nunique count.您可以使用nunique计数执行groupby + expanding

You need to cheat a bit as expanding currently only supports numerical values.您需要作弊,因为当前expanding仅支持数值。 So I factorized the data first:所以我先factorized数据:

df['want'] = (
 pd.Series(df['item'].factorize()[0], index=df.index)
   .groupby(df['group'])
   .expanding()
   .apply(lambda s: s.nunique())
   .droplevel(0)
   .astype(int)
 )

Output: Output:

  group        time item  want
0     a   9/30/2014    a     1
1     a  10/30/2014    a     1
2     a  11/30/2014    b     2
3     b   9/30/2014    c     1
4     b  10/30/2014    d     2
5     b  11/30/2014    d     2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM