Having a pandas
data frame:
date path size
0 2019-05-10 /bar/A 3
1 2019-05-10 /bar/B 7
2 2019-05-10 /bar/C 2
3 2019-05-14 /bar/A 4
4 2019-05-14 /bar/B 8
5 2019-05-14 /bar/C 23
6 2019-05-18 /bar/A 11
7 2019-05-18 /bar/B 75
8 2019-05-18 /bar/C 32
I would like to groupby
"path" and return the cumulative sum of the column "size" for each "date"
Looking at this answer: Pandas groupby cumulative sum
a simple df.groupby(["path"])["size"].cumsum()
or df.groupby(["path","date"])["size"].cumsum()
will not work.
In the end the cumulative sum should be plotted by date and colored by group to indicate the accumulated growth of "size" over time.
/bar/A /bar/B /bar/C
2019-05-10 3 7 2
2019-05-14 7 15 26
2019-05-18 18 90 58
Is there any pandas
-based solution without seaborn
or other tools?
I think you can achieve that pivoting the table and then applying the cumulative sum.
pivot = pd.pivot_table(df, values="size", index=["date"], columns=["path"], aggfunc=np.sum)
pivot = pivot.cumsum()
See the results, based on the example of your question:
df
Out[14]:
date path size
0 2019-05-10 /bar/A 3
1 2019-05-10 /bar/B 7
2 2019-05-10 /bar/C 2
3 2019-05-14 /bar/A 4
4 2019-05-14 /bar/B 8
5 2019-05-14 /bar/C 23
6 2019-05-18 /bar/A 11
7 2019-05-18 /bar/B 75
8 2019-05-18 /bar/C 32
pivot = pd.pivot_table(df, values="size", index=["date"], columns=["path"], aggfunc=np.sum)
pivot.cumsum()
Out[16]:
path /bar/A /bar/B /bar/C
date
2019-05-10 3 7 2
2019-05-14 7 15 25
2019-05-18 18 90 57
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.