I am using pandas to group the same time of a day (hour) and then average across all days for a diurnal cycle, in other words, apply multi-day mean on each hour. Furthermore, I want to average the data across different sources, eg. stations or countries.
Specifically, I have a dataframe df with pandas time index as below:
A B C
2010-01-02-07:00 10 22 30
2010-01-02-08:00 12 20 NaN
2010-01-03-07:00 11 8 15
2010-01-03-08:00 10 10 9
2010-01-03-09:00 11 13 18
2010-01-05-07:00 NaN 10 16
2010-01-05-09:00 14 0 7
Following this post: Can pandas groupby aggregate into a list, rather than sum, mean, etc? , I can achieve my goal by extracting all the data of the same hour and concatenating them into one list. But I am still wondering if there is a more straightforward or nicer way to do this?
Here I show my code as below:
df['hour'] = df.index.hour # create a new column for each time stamp
grp = df.groupby('hour').agg(lambda x: tuple(x)) # group by hour
result = grp[grp.columns[0]] # append all the columns
for col in grp.columns:
result = result + grp[col]
diurnal = [np.nanmean(np.array(result[hour])) for hour in grp.index] # average each tuple
And here is the output:
Out:
[15.25, 12.2, 10.5]
Many thanks!
I tried @Nickil's method:
data = {'A': [10, 12, 11, 10, 11, np.nan, 14], 'B': [22, 20, 8, 10, 13, 10, 0], 'C': [30, np.nan, 15, 9, 18, 16, 7]}
df = pd.DataFrame(data, index=[datetime.datetime(2010,1,2,7,0), datetime.datetime(2010,1,2,8,0), datetime.datetime(2010,1,3,7,0), datetime.datetime(2010,1,3,8,0), datetime.datetime(2010,1,3,9,0), datetime.datetime(2010,1,5,7,0), datetime.datetime(2010,1,5,9,0)])
df.index = df.index.hour
diurnal = df.stack().mean(level=0).tolist()
This is what I get:
Out:
[20.666666666666668, 16.0, 11.333333333333334, 9.6666666666666661, 14.0, 13.0, 7.0]
This should be a simpler approach:
1) Access the hour using .hour
attribute and assign this as the new index axis.
2) Stack the DF
so that all columns fall under a single wholesome column. Perform Groupby
wrt the hour labels (comprising of level=0
of the multi-index) and compute the mean.
df.index = df.index.hour
df.stack().mean(level=0).tolist()
Out[20]:
[15.25, 12.2, 10.5]
另一种可能性:
diurnal = [np.nanmean(g) for _, g, in df.groupby(df.index.hour)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.