I have a dataframe that looks like the following--
month source_id revenue
April PA0057 16001.0
PA0202 54063.0
PA0678 24219.0
PA0873 41827.0
August PA0057 40673.0
PA0202 75281.0
PA0678 60318.0
PA0873 55243.0
December PA0057 49781.0
PA0202 71797.0
PA0678 24975.0
PA0873 57630.0
February PA0057 13193.0
PA0202 44211.0
PA0678 29862.0
PA0873 36436.0
January PA0057 65707.0
PA0202 67384.0
PA0678 29392.0
PA0873 46854.0
July PA0057 31533.0
PA0202 49663.0
PA0678 10520.0
PA0873 53634.0
June PA0057 97229.0
PA0202 56115.0
PA0678 72770.0
PA0873 51260.0
March PA0057 44622.0
PA0202 54079.0
PA0678 36776.0
PA0873 42873.0
May PA0057 38077.0
PA0202 68103.0
PA0678 78012.0
PA0873 83464.0
November PA0057 26599.0
PA0202 53050.0
PA0678 87853.0
PA0873 65499.0
October PA0057 47638.0
PA0202 44445.0
PA0678 49983.0
PA0873 57926.0
September PA0057 46171.0
PA0202 49202.0
PA0678 42598.0
PA0873 65660.0
I wanted to draw a line plot where the x-axis is the month , the y-axis is revenue and I have 4 source_id- PA0057, PA0202, PA0678, PA0873, so I wanted one line for each of the source ids
How do I show this as 4 lines on a line graph??
I have used the below
import matplotlib.pyplot as pls
my_df.plot(x='month', y='revenue', kind='line')
plt.show()
but it doesn't give me the expected result as I am not feeding in source ids
pandas.DataFrame.groupby
df.groupby(['month', 'source_id']).agg({'revenue': sum})
pandas.Categorical
to set the 'month'
column as ordered
calendar
module is part of the Standard Library and we just using it for an ordered list of the .month_name
s seabron.lineplot
with the hue
parameter to plot the dataframe.
seaborn
is a high-level API for matplotlib
and will make many plots for easier. import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import calendar
# given some dataframe, perform groupby and reset the index
dfg = df.groupby(['month', 'source_id']).agg({'revenue': sum}).reset_index()
# display(dfg) - your dataframe should be in the following form
month source_id revenue
0 April PA0057 16001
1 April PA0202 54063
2 April PA0678 24219
3 April PA0873 41827
4 August PA0057 40673
# set the month column as categorical and set the order for calendar months
dfg.month = pd.Categorical(df.month, categories=list(calendar.month_name)[1:], ordered=True)
# plot with seaborn and use the hue parameter
plt.figure(figsize=(10, 6))
sns.lineplot(x='month', y='revenue', data=dfg, hue='source_id')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xticks(rotation=90)
plt.show()
If none of the columns in your example are the index you can reshape your df with
df = df.set_index(['month', 'source_id']).unstack()
Which will give you a new dataframe with month
as index and source_id
as columns. Then you can call plot with.
df.plot()
The result will have as many lines as source_id
s are in the data.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.