简体   繁体   中英

Pandas groupby, cumulative sum and plot by category

Having a pandas data frame:

    date        path    size
0   2019-05-10  /bar/A  3
1   2019-05-10  /bar/B  7
2   2019-05-10  /bar/C  2
3   2019-05-14  /bar/A  4
4   2019-05-14  /bar/B  8
5   2019-05-14  /bar/C  23
6   2019-05-18  /bar/A  11
7   2019-05-18  /bar/B  75
8   2019-05-18  /bar/C  32

I would like to groupby "path" and return the cumulative sum of the column "size" for each "date"

Looking at this answer: Pandas groupby cumulative sum

a simple df.groupby(["path"])["size"].cumsum() or df.groupby(["path","date"])["size"].cumsum() will not work.

In the end the cumulative sum should be plotted by date and colored by group to indicate the accumulated growth of "size" over time.

            /bar/A /bar/B /bar/C
2019-05-10  3      7      2
2019-05-14  7      15     26
2019-05-18  18     90     58

Is there any pandas -based solution without seaborn or other tools?

I think you can achieve that pivoting the table and then applying the cumulative sum.

pivot = pd.pivot_table(df, values="size", index=["date"], columns=["path"], aggfunc=np.sum)
pivot = pivot.cumsum()

See the results, based on the example of your question:

df
Out[14]: 
         date    path  size
0  2019-05-10  /bar/A     3
1  2019-05-10  /bar/B     7
2  2019-05-10  /bar/C     2
3  2019-05-14  /bar/A     4
4  2019-05-14  /bar/B     8
5  2019-05-14  /bar/C    23
6  2019-05-18  /bar/A    11
7  2019-05-18  /bar/B    75
8  2019-05-18  /bar/C    32
pivot = pd.pivot_table(df, values="size", index=["date"], columns=["path"], aggfunc=np.sum)
pivot.cumsum()
Out[16]: 
path        /bar/A  /bar/B  /bar/C
date                              
2019-05-10       3       7       2
2019-05-14       7      15      25
2019-05-18      18      90      57

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM