繁体   English   中英

按一天中的小时分组熊猫数据

[英]Group Pandas data by hour of the day

我使用以下代码生成随机日期和值:

import pandas as pd
import numpy as np

time = pd.date_range('1/1/2000', periods=2000, freq='5min')

series = pd.Series(np.random.randint(100, size=2000), index=time)

输出看起来像这样:

2000-01-01 00:00:00    40
2000-01-01 00:05:00    13
2000-01-01 00:10:00    99
2000-01-01 00:15:00    72
2000-01-01 00:20:00     4
2000-01-01 00:25:00    36
2000-01-01 00:30:00    24
2000-01-01 00:35:00    20
2000-01-01 00:40:00    83
2000-01-01 00:45:00    44

然后按索引小时值对这些数据进行排序,然后按如下所示的平均值对其进行汇总:

0     50.380952
1     49.380952
2     49.904762
3     53.273810
4     47.178571
5     46.095238
6     49.047619
7     44.297619
8     53.119048
9     48.261905
10    45.166667
11    54.214286
12    50.714286
13    56.130952
14    50.916667
15    42.428571
16    46.880952
17    56.892857
18    54.071429
19    47.607143
20    50.940476
21    50.511905
22    44.550000
23    50.250000

但是,如果我想按索引小时值而不是平均值将所有数据分组,那么我应该怎么做呢?

提前致谢。

问候,

如果要以hour单位聚合:

np.random.seed(456)
time = pd.date_range('1/1/2000', periods=2000, freq='5min')
series = pd.Series(np.random.randint(100, size=2000), index=time)

s = series.groupby(series.index.hour).mean()
print (s)
0     49.392857
1     52.523810
2     53.047619
3     49.083333
4     49.785714
5     49.071429
6     52.476190
7     47.821429
8     52.190476
9     50.000000
10    49.035714
11    52.988095
12    52.785714
13    52.023810
14    46.964286
15    52.095238
16    51.047619
17    52.166667
18    48.357143
19    51.416667
20    45.214286
21    46.130952
22    49.750000
23    48.527778
dtype: float64

但是,如果需要按小时显示MultiIndex

series.index = [series.index.hour, series.index]
print (series)
0   2000-01-01 00:00:00    27
    2000-01-01 00:05:00    43
    2000-01-01 00:10:00    89
    2000-01-01 00:15:00    42
    2000-01-01 00:20:00    28
    2000-01-01 00:25:00    79
    2000-01-01 00:30:00    60
    2000-01-01 00:35:00    45
    2000-01-01 00:40:00    37
    2000-01-01 00:45:00    92
    2000-01-01 00:50:00    39
    2000-01-01 00:55:00    81
1   2000-01-01 01:00:00    11
    2000-01-01 01:05:00    77
    2000-01-01 01:10:00    69
    2000-01-01 01:15:00    98

...

然后可以按小时选择:

print (series.loc[0])
2000-01-01 00:00:00    27
2000-01-01 00:05:00    43
2000-01-01 00:10:00    89
2000-01-01 00:15:00    42
2000-01-01 00:20:00    28
2000-01-01 00:25:00    79
2000-01-01 00:30:00    60
2000-01-01 00:35:00    45
2000-01-01 00:40:00    37
2000-01-01 00:45:00    92
2000-01-01 00:50:00    39
2000-01-01 00:55:00    81
2000-01-02 00:00:00    82
2000-01-02 00:05:00    69
2000-01-02 00:10:00    99
2000-01-02 00:15:00    17
2000-01-02 00:20:00    59
...

另外,如果需要mean s,则不更改DatetimeIndex

s1 = series.groupby(series.index.hour).transform('mean')
print (s1)
2000-01-01 00:00:00    49.392857
2000-01-01 00:05:00    49.392857
2000-01-01 00:10:00    49.392857
2000-01-01 00:15:00    49.392857
2000-01-01 00:20:00    49.392857
2000-01-01 00:25:00    49.392857
2000-01-01 00:30:00    49.392857
2000-01-01 00:35:00    49.392857
2000-01-01 00:40:00    49.392857
2000-01-01 00:45:00    49.392857
2000-01-01 00:50:00    49.392857
2000-01-01 00:55:00    49.392857
2000-01-01 01:00:00    52.523810
2000-01-01 01:05:00    52.523810
2000-01-01 01:10:00    52.523810
2000-01-01 01:15:00    52.523810
2000-01-01 01:20:00    52.523810
2000-01-01 01:25:00    52.523810
2000-01-01 01:30:00    52.523810
...

编辑:

对于每小时使用的列表:

s = series.groupby(series.index.hour).apply(list)
print (s)
0     [27, 43, 89, 42, 28, 79, 60, 45, 37, 92, 39, 8...
1     [11, 77, 69, 98, 78, 84, 34, 66, 4, 8, 85, 62,...
2     [16, 41, 10, 72, 44, 35, 48, 51, 99, 53, 22, 3...
3     [56, 22, 74, 85, 81, 6, 44, 44, 49, 43, 95, 11...
4     [21, 90, 89, 76, 62, 20, 66, 50, 68, 79, 69, 4...
5     [51, 85, 31, 58, 97, 10, 91, 25, 4, 11, 94, 28...
6     [5, 71, 62, 57, 62, 87, 12, 41, 43, 47, 25, 15...
7     [84, 17, 26, 32, 14, 76, 72, 35, 8, 60, 79, 27...
8     [15, 30, 80, 53, 10, 97, 71, 83, 37, 44, 89, 1...
9     [58, 20, 98, 77, 75, 26, 63, 26, 24, 62, 93, 6...
10    [39, 61, 92, 43, 61, 73, 86, 64, 26, 0, 75, 11...
11    [24, 13, 13, 54, 50, 38, 22, 46, 67, 15, 29, 4...
12    [21, 56, 16, 63, 46, 79, 11, 85, 87, 18, 66, 9...
13    [10, 89, 66, 80, 60, 2, 6, 19, 77, 81, 38, 48,...
14    [17, 64, 90, 91, 71, 32, 77, 9, 76, 14, 9, 79,...
15    [95, 75, 49, 34, 5, 31, 43, 68, 84, 48, 25, 69...
16    [13, 68, 87, 96, 6, 83, 9, 5, 29, 93, 57, 92, ...
17    [77, 6, 73, 41, 76, 93, 11, 50, 72, 84, 82, 53...
18    [95, 11, 61, 56, 30, 24, 24, 9, 0, 65, 96, 82,...
19    [31, 14, 98, 67, 7, 54, 29, 60, 77, 83, 45, 70...
20    [4, 15, 37, 78, 79, 59, 63, 97, 14, 74, 33, 2,...
21    [88, 69, 31, 20, 41, 10, 41, 6, 36, 27, 63, 49...
22    [4, 90, 70, 66, 92, 46, 54, 47, 6, 54, 62, 80,...
23    [27, 23, 21, 18, 29, 39, 77, 88, 21, 86, 7, 45...
dtype: object

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM