简体   繁体   中英

how to aggregate data in Y axis and plot a line graph in python

I have data as below. I want to plot a simple line graph in python where item on the X axis and total sales in the Y axis. Total sales is the aggregated sales at the item level.

Can anyone help on this?

Item    Date    Sales
Item1   4/25/2018   55
Item2   4/25/2018   21
Item3   4/25/2018   50
Item4   4/25/2018   58
Item5   4/25/2018   81
Item6   4/25/2018   79
Item7   4/25/2018   61
Item8   4/25/2018   37
Item9   4/25/2018   51
Item10  4/25/2018   53
Item1   4/26/2018   27
Item2   4/26/2018   28
Item3   4/26/2018   26
Item4   4/26/2018   95
Item5   4/26/2018   15
Item6   4/26/2018   89
Item7   4/26/2018   42
Item8   4/26/2018   21
Item9   4/26/2018   39
Item10  4/26/2018   67
Item1   4/27/2018   14
Item2   4/27/2018   45
Item3   4/27/2018   35
Item4   4/27/2018   68
Item5   4/27/2018   76
Item6   4/27/2018   63
Item7   4/27/2018   73
Item8   4/27/2018   61
Item9   4/27/2018   59
Item10  4/27/2018   93
Item1   4/28/2018   27
Item2   4/28/2018   63
Item3   4/28/2018   55
Item4   4/28/2018   73
Item5   4/28/2018   58
Item6   4/28/2018   90
Item7   4/28/2018   67
Item8   4/28/2018   72
Item9   4/28/2018   64
Item10  4/28/2018   98

Regards, philip

Using pandas, this would be achieved by loading the data into a dataframe, doing a groupby and adding the values of sales per group. Eventually, pandas has wrapped some usual matplotlib plots which can be called directly from pandas.

# df['Date'] = pd.to_datetime(df['Date']) # For the desired plot it is not necessary but
                                          # it is a good idea, and allow plots by date too 
df.groupby(by='Item').sum().plot.bar(y='Sales',color='g')

Which generates the following plot:

在此处输入图片说明

In order to sort items from 1 to 10 according to the digit, this answer can be used before plotting.

Use pandas,

import pandas as pd

df = pd.read_csv('yourfile.csv')

df_grp = df.groupby('Item')['Sales'].sum()
df_grp = df_grp[df_grp.index.str.split('Item').str[1].astype(int).argsort()]
df_grp.plot()
plt.xticks(np.arange(df_grp.shape[0]), df_grp.index, rotation=90)

Output:

在此处输入图片说明

There is also the option of doing it without pandas, only with numpy. Strating from an array items containing the info on the Item column and sales for sale, the code would look like:

u, f = np.unique(items, return_inverse=True) # returns unique array of occurences and the indices to retrieve the original items array.
# i.e. u, f = np.unique([1,2,3,1,1,2,1], return_inverse=True) returns
# u: [1,2,3]
# f: [0 1 2 0 0 1 0] such that u[f]==[1,2,3,1,1,2,1]
imgs = np.bincount(f, sales) 
inds = np.argsort(np.char.lstrip(u,'Item').astype(int))
plt.plot(np.arange(len(u)),imgs[inds])
plt.xticks(np.arange(len(u)),u[inds])

Note: the input arrays are assumed to have the propper dtype, if this was not the case, then they should be casted into the correct dtype with .astype()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM