Count of incremental duplicates over dates in Pandas

Question

I have a dataframe in the format below:

 day         value  
 1/1/15      aa
 2/1/15      bb
 3/1/15      bb
 3/1/15      cc
 4/1/15      ee
 4/1/15      ff
 4/1/15      aa

I would like to first: group by 'day' and then count the unique values in 'value' adding up the count incrementally for each subsequent day.

The result would look like:

 day         value  
 1/1/15      1
 2/1/15      2
 3/1/15      3
 4/1/15      5

The solution would be ideally in pandas. I don't know where to start, the only idea that I have is too count per group and then use defaultdict to sum up, but how to do it incrementally following the order of the dates?

Thanks! Vincenzo

Answer 1

The following works:

values = [l+l for l in ascii_lowercase[:8]
dates = pd.date_range(date(2016, 1, 1), date(2016, 3, 30))
df = pd.DataFrame(data=np.random.choice(values, 500), index=np.random.choice(dates, 500), columns=['value'])
df.sort_index().head(25)

           value
2016-01-01    bb
2016-01-01    dd
2016-01-01    ff
2016-01-02    hh
2016-01-02    aa
2016-01-02    ee
2016-01-02    aa
2016-01-02    gg
2016-01-02    hh
2016-01-02    aa
2016-01-03    cc
2016-01-03    ee

print(df.groupby(level=0)['value'].apply(lambda x: x.nunique()).cumsum())

2016-01-01      3
2016-01-02      7
2016-01-03      9
2016-01-04     13
2016-01-05     18
2016-01-06     20

Count of incremental duplicates over dates in Pandas

Question

1 answers

solution1
0 2016-01-13 15:28:32

Count of incremental duplicates over dates in Pandas

Question

1 answers

solution1 0 2016-01-13 15:28:32

solution1
0 2016-01-13 15:28:32