I have a very simple pandas DataFrame with the following format:
date P1 P2 day
2015-01-01 190 1132 Thursday
2015-01-01 225 1765 Thursday
2015-01-01 3427 29421 Thursday
2015-01-01 945 7679 Thursday
2015-01-01 1228 9537 Thursday
2015-01-01 870 6903 Thursday
2015-01-02 785 4768 Friday
2015-01-02 1137 7065 Friday
2015-01-02 175 875 Friday
where P1 and P2 are different parameters of interest. I would like to create a bar graph that looks like this for each P1 and P2. As shown in the data, I have several values for each day. I would like to average the given values for a given day, then plot against day of week (so that the averaged value for Monday week 1 gets added to Monday week 2, etc.).
I'm new to python and my current method is quite nasty, involving several loops. I currently have two dedicated portions of code - one to do the averages and another to go through each day of the week one at a time and tally results for plotting. Is there a cleaner way to do this?
Seems like you're looking for:
df[['day', 'P1']].groupby('day').mean().plot(kind='bar', legend=None)
and
df[['day', 'P2']].groupby('day').mean().plot(kind='bar', legend=None)
Full example:
import numpy as np
import pandas as pd
days = ['Mon', 'Tue', 'Wed', 'Thur', 'Fri', 'Sat', 'Sun']
day = np.random.choice(days, size=1000)
p1, p2 = np.random.randint(low=0, high=2500, size=(2, 1000))
df = pd.DataFrame({'P1': p1, 'P2': p2, 'day': day})
# Helps for ordering of day-of-week in plot
df['day'] = pd.Categorical(df.day, categories=days)
# %matplotlib inline
df[['day', 'P1']].groupby('day').mean().plot(kind='bar', legend=None)
df[['day', 'P2']].groupby('day').mean().plot(kind='bar', legend=None)
Note that on your existing DataFrame the call to pd.Categorical
gets you a custom sort key as shown here .
Result (for P1):
In your comment you asked,
Does groupby find the average of a given parameter (say P1), over all instances of the group? For instance, if I have 8 Mondays, is the resulting value the average of all datapoints that occurred on Monday? An added hurdle here is that I have unreliable sampling for the data. If I had a Monday with 10 samples and a Monday with 1, simply averaging all 11 values would drown out the Monday with a small sample size. Thus, I would like to average all values for a given date before considering the day of week.
Yes, the groupby above would find the average for all instances. Here's how you could get to that "double" averaging:
# for P1; replace P2 with P1 to find P2 avgs.
df.drop('P2', axis=1).groupby(['date', 'day']).mean()\
.reset_index().groupby('day').mean().plot(kind='bar', legend=None)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.