Pandas: count cumulated unique string values per group

Question

I have a question regarding groupby, but I want to groupby period of time in such time period and compute the size of "item" (1month, 2month, 3month).

For example, the data shown below:

group    time      item
1      9/30/2014      a
1      10/30/2014     a
1      11/30/2014     b
2      9/30/2014      c
2      10/30/2014     d
2      11/30/2014     d

I would like to use the groupby as the time goes to sum the size of the item

group    time      item   want
1      9/30/2014      a     1 (because we only have "a" in 9/30/2014 )
1      10/30/2014     a     1 (because we only have "a" from 9/30/2014 to 10/30/2014)
1      11/30/2014     b     2 (because we have "a" and "b" from 9/30/2014 to 11/30/2014)
2      9/30/2014      c     1  
2      10/30/2014     d     2
2      11/30/2014     d     2

I appreciate your help. Thank you very much.

Answer 1

You can perform a groupby + expanding with a nunique count.

You need to cheat a bit as expanding currently only supports numerical values. So I factorized the data first:

df['want'] = (
 pd.Series(df['item'].factorize()[0], index=df.index)
   .groupby(df['group'])
   .expanding()
   .apply(lambda s: s.nunique())
   .droplevel(0)
   .astype(int)
 )

Output:

  group        time item  want
0     a   9/30/2014    a     1
1     a  10/30/2014    a     1
2     a  11/30/2014    b     2
3     b   9/30/2014    c     1
4     b  10/30/2014    d     2
5     b  11/30/2014    d     2

Pandas: count cumulated unique string values per group

Question

1 answers

solution1
-1 2022-02-03 05:45:34

Pandas: count cumulated unique string values per group

Question

1 answers

solution1 -1 2022-02-03 05:45:34

solution1
-1 2022-02-03 05:45:34