I would like to aggregate some data by hour using pandas and display the date instead of an index.
The code I have right now is the following:
import pandas as pd
import numpy as np
dates = pd.date_range('1/1/2011', periods=20, freq='25min')
data = pd.Series(np.random.randint(100, size=20), index=dates)
result = data.groupby(data.index.hour).sum().reset_index(name='Sum')
print(result)
Which displays something along the lines of:
index Sum
0 0 131
1 1 116
2 2 180
3 3 62
4 4 95
5 5 107
6 6 89
7 7 169
The problem is that instead of index I want to display the date associated with that hour.
The result I'm trying to achieve is the following:
index Sum
0 2011-01-01 01:00:00 131
1 2011-01-01 02:00:00 116
2 2011-01-01 03:00:00 180
3 2011-01-01 04:00:00 62
4 2011-01-01 05:00:00 95
5 2011-01-01 06:00:00 107
6 2011-01-01 07:00:00 89
7 2011-01-01 08:00:00 169
Is there any way I can do that easily using pandas?
data.groupby(data.index.strftime('%Y-%m-%d %H:00:00')).sum().reset_index(name='Sum')
You could use resample
.
data.resample('H').sum()
Output:
2011-01-01 00:00:00 84
2011-01-01 01:00:00 121
2011-01-01 02:00:00 160
2011-01-01 03:00:00 70
2011-01-01 04:00:00 88
2011-01-01 05:00:00 131
2011-01-01 06:00:00 56
2011-01-01 07:00:00 109
Freq: H, dtype: int32
Option #2
data.groupby(data.index.floor('H')).sum()
Output:
2011-01-01 00:00:00 84
2011-01-01 01:00:00 121
2011-01-01 02:00:00 160
2011-01-01 03:00:00 70
2011-01-01 04:00:00 88
2011-01-01 05:00:00 131
2011-01-01 06:00:00 56
2011-01-01 07:00:00 109
dtype: int32
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.