简体   繁体   English

蟒蛇熊猫按小时计算

[英]python pandas sum by hour of day

I'm working with the following dataset with hourly counts (df): The datframe has 8784 rows (for the year 2016, hourly). 我正在使用以下数据集,每小时计数(df):数据框有8784行(2016年,每小时)。

数据帧(df)

I'd like to see if there are daily trends (eg if there is an increase in the morning hours. For this i'd like to create a plot that has the hour of the day (from 0 to 24) on the x-axis and number of cyclists on the y axis (something like in the picture below from http://ofdataandscience.blogspot.co.uk/2013/03/capital-bikeshare-time-series-clustering.html ). 我想看看是否有每日趋势(例如,如果早上时间有所增加。为此我想创建一个在x-上有一天中的小时(从0到24)的情节。轴上和y轴上的骑车者数量(如下图所示http://ofdataandscience.blogspot.co.uk/2013/03/capital-bikeshare-time-series-clustering.html )。

在此输入图像描述

I experimented with differet ways of pivot , resample and set_index and plotting it with matplotlib, without success. 我尝试了不同的pivotresampleset_index方法,并用matplotlib绘制它,但没有成功。 In other words, i couldn't find a way to sum up every observation at a certain hour and then plot those for each weekday 换句话说,我无法找到一种方法来总结某个时刻的每个观察结果,然后为每个工作日绘制那些观察结果

Any ideas how to do this? 任何想法如何做到这一点? Thanks in advance! 提前致谢!

I think you can use groupby by hour and weekday and aggregate sum (or maybe mean ), last reshape by unstack and DataFrame.plot : 我想你可以使用groupby通过hourweekday和总sum (或者mean ),由过去的整形unstackDataFrame.plot

df = df.groupby([df['Date'].dt.hour, 'weekday'])['Cyclists'].sum().unstack().plot()

Solution with pivot_table : 使用pivot_table解决方案:

df1 = df.pivot_table(index=df['Date'].dt.hour, 
                     columns='weekday', 
                     values='Cyclists', 
                     aggfunc='sum').plot()

Sample: 样品:

N = 200
np.random.seed(100)
rng = pd.date_range('2016-01-01', periods=N, freq='H')
df = pd.DataFrame({'Date': rng, 'Cyclists': np.random.randint(100, size=N)}) 
df['weekday'] = df['Date'].dt.weekday_name
print (df.head())
   Cyclists                Date weekday
0         8 2016-01-01 00:00:00  Friday
1        24 2016-01-01 01:00:00  Friday
2        67 2016-01-01 02:00:00  Friday
3        87 2016-01-01 03:00:00  Friday
4        79 2016-01-01 04:00:00  Friday

print (df.groupby([df['Date'].dt.hour, 'weekday'])['Cyclists'].sum().unstack())
weekday  Friday  Monday  Saturday  Sunday  Thursday  Tuesday  Wednesday
Date                                                                   
0           102      91       120      53        95       86         21
1           102      83       100      27        20       94         25
2           121      53       105      56        10       98         54
3           164      78        54      30         8       42          6
4           163       0        43      48        89       84         37
5            49      13       150      47        72       95         58
6            24      57        32      39        30       76         39
7           127      76       128      38        12       33         94
8            72       3        59      44        18       58         51
9           138      70        67      18        93       42         30
10           77       3         7      64        92       22         66
11          159      84        49      56        44        0         24
12          156      79        47      34        57       55         55
13           42      10        65      53         0       98         17
14          116      87        61      74        73       19         45
15          106      60        14      17        54       53         89
16           22       3        55      72        92       68         45
17          154      48        71      13        66       62         35
18           60      52        80      30        16       50         16
19           79      43         2      17         5       68         12
20           11      36        94      53        51       35         86
21          180       5        19      68        90       23         82
22          103      71        98      50        34        9         67
23           92      38        63      91        67       48         92

df.groupby([df['Date'].dt.hour, 'weekday'])['Cyclists'].sum().unstack().plot()

图形

EDIT: 编辑:

You can also convert wekkday to categorical for correct soting of columns by names of week: 您还可以将wekkday转换为categorical以便按星期名称正确填写列:

names = [ 'Monday', 'Tuesday', 'Wednesday', 'Thursday','Friday', 'Saturday', 'Sunday']
df['weekday'] = df['weekday'].astype('category', categories=names, ordered=True)
df.groupby([df['Date'].dt.hour, 'weekday'])['Cyclists'].sum().unstack().plot()

graph1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM