I have a dataframe in the format below:
day value
1/1/15 aa
2/1/15 bb
3/1/15 bb
3/1/15 cc
4/1/15 ee
4/1/15 ff
4/1/15 aa
I would like to first: group by 'day' and then count the unique values in 'value' adding up the count incrementally for each subsequent day.
The result would look like:
day value
1/1/15 1
2/1/15 2
3/1/15 3
4/1/15 5
The solution would be ideally in pandas. I don't know where to start, the only idea that I have is too count per group and then use defaultdict to sum up, but how to do it incrementally following the order of the dates?
Thanks! Vincenzo
The following works:
values = [l+l for l in ascii_lowercase[:8]
dates = pd.date_range(date(2016, 1, 1), date(2016, 3, 30))
df = pd.DataFrame(data=np.random.choice(values, 500), index=np.random.choice(dates, 500), columns=['value'])
df.sort_index().head(25)
value
2016-01-01 bb
2016-01-01 dd
2016-01-01 ff
2016-01-02 hh
2016-01-02 aa
2016-01-02 ee
2016-01-02 aa
2016-01-02 gg
2016-01-02 hh
2016-01-02 aa
2016-01-03 cc
2016-01-03 ee
print(df.groupby(level=0)['value'].apply(lambda x: x.nunique()).cumsum())
2016-01-01 3
2016-01-02 7
2016-01-03 9
2016-01-04 13
2016-01-05 18
2016-01-06 20
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.