[英]Group Pandas data by hour of the day
我使用以下代码生成随机日期和值:
import pandas as pd
import numpy as np
time = pd.date_range('1/1/2000', periods=2000, freq='5min')
series = pd.Series(np.random.randint(100, size=2000), index=time)
输出看起来像这样:
2000-01-01 00:00:00 40
2000-01-01 00:05:00 13
2000-01-01 00:10:00 99
2000-01-01 00:15:00 72
2000-01-01 00:20:00 4
2000-01-01 00:25:00 36
2000-01-01 00:30:00 24
2000-01-01 00:35:00 20
2000-01-01 00:40:00 83
2000-01-01 00:45:00 44
然后按索引小时值对这些数据进行排序,然后按如下所示的平均值对其进行汇总:
0 50.380952
1 49.380952
2 49.904762
3 53.273810
4 47.178571
5 46.095238
6 49.047619
7 44.297619
8 53.119048
9 48.261905
10 45.166667
11 54.214286
12 50.714286
13 56.130952
14 50.916667
15 42.428571
16 46.880952
17 56.892857
18 54.071429
19 47.607143
20 50.940476
21 50.511905
22 44.550000
23 50.250000
但是,如果我想按索引小时值而不是平均值将所有数据分组,那么我应该怎么做呢?
提前致谢。
问候,
如果要以hour
单位聚合:
np.random.seed(456)
time = pd.date_range('1/1/2000', periods=2000, freq='5min')
series = pd.Series(np.random.randint(100, size=2000), index=time)
s = series.groupby(series.index.hour).mean()
print (s)
0 49.392857
1 52.523810
2 53.047619
3 49.083333
4 49.785714
5 49.071429
6 52.476190
7 47.821429
8 52.190476
9 50.000000
10 49.035714
11 52.988095
12 52.785714
13 52.023810
14 46.964286
15 52.095238
16 51.047619
17 52.166667
18 48.357143
19 51.416667
20 45.214286
21 46.130952
22 49.750000
23 48.527778
dtype: float64
但是,如果需要按小时显示MultiIndex
:
series.index = [series.index.hour, series.index]
print (series)
0 2000-01-01 00:00:00 27
2000-01-01 00:05:00 43
2000-01-01 00:10:00 89
2000-01-01 00:15:00 42
2000-01-01 00:20:00 28
2000-01-01 00:25:00 79
2000-01-01 00:30:00 60
2000-01-01 00:35:00 45
2000-01-01 00:40:00 37
2000-01-01 00:45:00 92
2000-01-01 00:50:00 39
2000-01-01 00:55:00 81
1 2000-01-01 01:00:00 11
2000-01-01 01:05:00 77
2000-01-01 01:10:00 69
2000-01-01 01:15:00 98
...
然后可以按小时选择:
print (series.loc[0])
2000-01-01 00:00:00 27
2000-01-01 00:05:00 43
2000-01-01 00:10:00 89
2000-01-01 00:15:00 42
2000-01-01 00:20:00 28
2000-01-01 00:25:00 79
2000-01-01 00:30:00 60
2000-01-01 00:35:00 45
2000-01-01 00:40:00 37
2000-01-01 00:45:00 92
2000-01-01 00:50:00 39
2000-01-01 00:55:00 81
2000-01-02 00:00:00 82
2000-01-02 00:05:00 69
2000-01-02 00:10:00 99
2000-01-02 00:15:00 17
2000-01-02 00:20:00 59
...
另外,如果需要mean
s,则不更改DatetimeIndex
:
s1 = series.groupby(series.index.hour).transform('mean')
print (s1)
2000-01-01 00:00:00 49.392857
2000-01-01 00:05:00 49.392857
2000-01-01 00:10:00 49.392857
2000-01-01 00:15:00 49.392857
2000-01-01 00:20:00 49.392857
2000-01-01 00:25:00 49.392857
2000-01-01 00:30:00 49.392857
2000-01-01 00:35:00 49.392857
2000-01-01 00:40:00 49.392857
2000-01-01 00:45:00 49.392857
2000-01-01 00:50:00 49.392857
2000-01-01 00:55:00 49.392857
2000-01-01 01:00:00 52.523810
2000-01-01 01:05:00 52.523810
2000-01-01 01:10:00 52.523810
2000-01-01 01:15:00 52.523810
2000-01-01 01:20:00 52.523810
2000-01-01 01:25:00 52.523810
2000-01-01 01:30:00 52.523810
...
编辑:
对于每小时使用的列表:
s = series.groupby(series.index.hour).apply(list)
print (s)
0 [27, 43, 89, 42, 28, 79, 60, 45, 37, 92, 39, 8...
1 [11, 77, 69, 98, 78, 84, 34, 66, 4, 8, 85, 62,...
2 [16, 41, 10, 72, 44, 35, 48, 51, 99, 53, 22, 3...
3 [56, 22, 74, 85, 81, 6, 44, 44, 49, 43, 95, 11...
4 [21, 90, 89, 76, 62, 20, 66, 50, 68, 79, 69, 4...
5 [51, 85, 31, 58, 97, 10, 91, 25, 4, 11, 94, 28...
6 [5, 71, 62, 57, 62, 87, 12, 41, 43, 47, 25, 15...
7 [84, 17, 26, 32, 14, 76, 72, 35, 8, 60, 79, 27...
8 [15, 30, 80, 53, 10, 97, 71, 83, 37, 44, 89, 1...
9 [58, 20, 98, 77, 75, 26, 63, 26, 24, 62, 93, 6...
10 [39, 61, 92, 43, 61, 73, 86, 64, 26, 0, 75, 11...
11 [24, 13, 13, 54, 50, 38, 22, 46, 67, 15, 29, 4...
12 [21, 56, 16, 63, 46, 79, 11, 85, 87, 18, 66, 9...
13 [10, 89, 66, 80, 60, 2, 6, 19, 77, 81, 38, 48,...
14 [17, 64, 90, 91, 71, 32, 77, 9, 76, 14, 9, 79,...
15 [95, 75, 49, 34, 5, 31, 43, 68, 84, 48, 25, 69...
16 [13, 68, 87, 96, 6, 83, 9, 5, 29, 93, 57, 92, ...
17 [77, 6, 73, 41, 76, 93, 11, 50, 72, 84, 82, 53...
18 [95, 11, 61, 56, 30, 24, 24, 9, 0, 65, 96, 82,...
19 [31, 14, 98, 67, 7, 54, 29, 60, 77, 83, 45, 70...
20 [4, 15, 37, 78, 79, 59, 63, 97, 14, 74, 33, 2,...
21 [88, 69, 31, 20, 41, 10, 41, 6, 36, 27, 63, 49...
22 [4, 90, 70, 66, 92, 46, 54, 47, 6, 54, 62, 80,...
23 [27, 23, 21, 18, 29, 39, 77, 88, 21, 86, 7, 45...
dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.