[英]Pandas - How to group-by and plot for each hour of each day of week
I need help figuring out how to plot sub-plots for easy comparison from my dataframe shown: 我需要帮助找出如何绘制子图以便从我显示的数据框中轻松比较:
Date A B C
2017-03-22 15:00:00 obj1 value_a other_1
2017-03-22 14:00:00 obj2 value_ns other_5
2017-03-21 15:00:00 obj3 value_kdsa other_23
2014-05-08 17:00:00 obj2 value_as other_4
2010-07-01 20:00:00 obj1 value_as other_0
I am trying to graph the occurrences of each hour for each respective day of the week. 我试图绘制每周每个小时的每小时的出现次数。 So count the number of occurrences for each day of the week and hour and plot them on subplots like the ones shown below.
因此,计算一周和每小时中每一天的出现次数,并将其绘制在如下所示的子图上。
If this question sounds confusing please let me know if you have any questions. 如果这个问题听起来很混乱,请告诉我您是否有任何疑问。 Thanks.
谢谢。
You can accomplish this with multiple groupby
. 您可以使用多个
groupby
完成此操作。 Since we know there are 7 days in a week, we can specify that number of panels. 由于我们知道一周有7天,我们可以指定面板数量。 If you
groupby(df.Date.dt.dayofweek)
, you can use the group index as the index for your subplot axes: 如果你
groupby(df.Date.dt.dayofweek)
,你可以使用组索引作为子图轴的索引:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
n = 10000
np.random.seed(123)
df = pd.DataFrame({'Date': pd.date_range('2010-01-01', freq='1.09min', periods=n),
'A': np.random.randint(1,10,n),
'B': np.random.normal(0,1,n)})
fig, ax = plt.subplots(ncols=7, figsize=(30,5))
plt.subplots_adjust(wspace=0.05) #Remove some whitespace between subplots
for idx, gp in df.groupby(df.Date.dt.dayofweek):
ax[idx].set_title(gp.Date.dt.day_name().iloc[0]) #Set title to the weekday
(gp.groupby(gp.Date.dt.hour).size().rename_axis('Tweet Hour').to_frame('')
.reindex(np.arange(0,24,1)).fillna(0)
.plot(kind='bar', ax=ax[idx], rot=0, ec='k', legend=False))
# Ticks and labels on leftmost only
if idx == 0:
_ = ax[idx].set_ylabel('Counts', fontsize=11)
_ = ax[idx].tick_params(axis='both', which='major', labelsize=7,
labelleft=(idx == 0), left=(idx == 0))
# Consistent bounds between subplots.
lb, ub = list(zip(*[axis.get_ylim() for axis in ax]))
for axis in ax:
axis.set_ylim(min(lb), max(ub))
plt.show()
If you'd like to make the aspect ratio less extreme, then consider plotting a 4x2 grid. 如果您想使宽高比不那么极端,那么考虑绘制一个4x2网格。 It's a very similar plot as above, once we
flatten
the axis array. 一旦我们
flatten
轴阵列,它就像上面一样非常相似。 There's some integer and remainder division to figure out which axes
need the labels. 有一些整数和余数除法来确定哪些
axes
需要标签。
fig, ax = plt.subplots(nrows=2, ncols=4, figsize=(20,10))
fig.delaxes(ax[1,3]) #7 days in a week, remove 8th panel
ax = ax.flatten() #Far easier to work with a flattened array
lsize=8
plt.subplots_adjust(wspace=0.05, hspace=0.15) #Remove some whitespace between subplots
for idx, gp in df.groupby(df.Date.dt.dayofweek):
ax[idx].set_title(gp.Date.dt.day_name().iloc[0]) #Set title to the weekday
(gp.groupby(gp.Date.dt.hour).size().rename_axis([None]).to_frame()
.reindex(np.arange(0,24,1)).fillna(0)
.plot(kind='bar', ax=ax[idx], rot=0, ec='k', legend=False))
# Titles on correct panels
if idx%4 == 0:
_ = ax[idx].set_ylabel('Counts', fontsize=11)
if (idx//4 == 1) | (idx%4 == 3):
_ = ax[idx].set_xlabel('Tweet Hour', fontsize=11)
# Ticks on correct panels
_ = ax[idx].tick_params(axis='both', which='major', labelsize=lsize,
labelbottom=(idx//4 == 1) | (idx%4 == 3),
bottom=(idx//4 == 1) | (idx%4 == 3),
labelleft=(idx%4 == 0),
left=(idx%4 == 0))
# Consistent bounds between subplots.
lb, ub = list(zip(*[axis.get_ylim() for axis in ax]))
for axis in ax:
axis.set_ylim(min(lb), max(ub))
plt.show()
What about using seaborn
? 用
seaborn
怎么seaborn
? sns.FacetGrid
was made for this: sns.FacetGrid
就是这样做的:
import pandas as pd
import seaborn as sns
# make some data
date = pd.date_range('today', periods=100, freq='2.5H')
# put in dataframe
df = pd.DataFrame({
'date' : date
})
# create day_of_week and hour columns
df['dow'] = df.date.dt.day_name()
df['hour'] = df.date.dt.hour
# create facet grid
g = sns.FacetGrid(data=df.groupby([
'dow',
'hour'
]).hour.count().to_frame(name='day_hour_count').reset_index(), col='dow', col_order=[
'Sunday',
'Monday',
'Tuesday',
'Wednesday',
'Thursday',
'Friday',
'Saturday'
], col_wrap=3)
# map barplot to each subplot
g.map(sns.barplot, 'hour', 'day_hour_count');
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.