Time series data: bin data to each day, then plot by day of week

Question

I have a very simple pandas DataFrame with the following format:

date        P1      P2      day
2015-01-01  190     1132    Thursday
2015-01-01  225     1765    Thursday
2015-01-01  3427    29421   Thursday
2015-01-01  945     7679    Thursday
2015-01-01  1228    9537    Thursday
2015-01-01  870     6903    Thursday
2015-01-02  785     4768    Friday
2015-01-02  1137    7065    Friday
2015-01-02  175     875     Friday

where P1 and P2 are different parameters of interest. I would like to create a bar graph that looks like this for each P1 and P2. As shown in the data, I have several values for each day. I would like to average the given values for a given day, then plot against day of week (so that the averaged value for Monday week 1 gets added to Monday week 2, etc.).

I'm new to python and my current method is quite nasty, involving several loops. I currently have two dedicated portions of code - one to do the averages and another to go through each day of the week one at a time and tally results for plotting. Is there a cleaner way to do this?

Answer 1

Seems like you're looking for:

df[['day', 'P1']].groupby('day').mean().plot(kind='bar', legend=None)

and

df[['day', 'P2']].groupby('day').mean().plot(kind='bar', legend=None)

Full example:

import numpy as np
import pandas as pd

days = ['Mon', 'Tue', 'Wed', 'Thur', 'Fri', 'Sat', 'Sun']
day = np.random.choice(days, size=1000)
p1, p2 = np.random.randint(low=0, high=2500, size=(2, 1000))
df = pd.DataFrame({'P1': p1, 'P2': p2, 'day': day})

# Helps for ordering of day-of-week in plot
df['day'] = pd.Categorical(df.day, categories=days)

# %matplotlib inline

df[['day', 'P1']].groupby('day').mean().plot(kind='bar', legend=None)
df[['day', 'P2']].groupby('day').mean().plot(kind='bar', legend=None)

Note that on your existing DataFrame the call to pd.Categorical gets you a custom sort key as shown here .

Result (for P1):

Update

In your comment you asked,

Does groupby find the average of a given parameter (say P1), over all instances of the group? For instance, if I have 8 Mondays, is the resulting value the average of all datapoints that occurred on Monday? An added hurdle here is that I have unreliable sampling for the data. If I had a Monday with 10 samples and a Monday with 1, simply averaging all 11 values would drown out the Monday with a small sample size. Thus, I would like to average all values for a given date before considering the day of week.

Yes, the groupby above would find the average for all instances. Here's how you could get to that "double" averaging:

# for P1; replace P2 with P1 to find P2 avgs.
df.drop('P2', axis=1).groupby(['date', 'day']).mean()\
    .reset_index().groupby('day').mean().plot(kind='bar', legend=None)

Time series data: bin data to each day, then plot by day of week

Question

1 answers

solution1
4 ACCPTED 2017-12-11 16:42:45

Update

Time series data: bin data to each day, then plot by day of week

Question

1 answers

solution1 4 ACCPTED 2017-12-11 16:42:45

Update

solution1
4 ACCPTED 2017-12-11 16:42:45