简体   繁体   中英

pandas groupby aggregate data across columns

I am using pandas to group the same time of a day (hour) and then average across all days for a diurnal cycle, in other words, apply multi-day mean on each hour. Furthermore, I want to average the data across different sources, eg. stations or countries.

Specifically, I have a dataframe df with pandas time index as below:

                     A    B    C 
2010-01-02-07:00    10   22   30
2010-01-02-08:00    12   20   NaN
2010-01-03-07:00    11   8    15
2010-01-03-08:00    10   10   9
2010-01-03-09:00    11   13   18
2010-01-05-07:00    NaN  10   16
2010-01-05-09:00    14   0    7

Following this post: Can pandas groupby aggregate into a list, rather than sum, mean, etc? , I can achieve my goal by extracting all the data of the same hour and concatenating them into one list. But I am still wondering if there is a more straightforward or nicer way to do this?

Here I show my code as below:

df['hour'] = df.index.hour        # create a new column for each time stamp
grp = df.groupby('hour').agg(lambda x: tuple(x))       # group by hour

result = grp[grp.columns[0]]          # append all the columns
for col in grp.columns:
    result = result + grp[col]

diurnal = [np.nanmean(np.array(result[hour]))  for hour in grp.index]       # average each tuple

And here is the output:

Out:
 [15.25, 12.2, 10.5]

Many thanks!

==========

I tried @Nickil's method:

data = {'A': [10, 12, 11, 10, 11, np.nan, 14], 'B': [22, 20, 8, 10, 13, 10, 0], 'C': [30, np.nan, 15, 9, 18, 16, 7]}
df = pd.DataFrame(data, index=[datetime.datetime(2010,1,2,7,0), datetime.datetime(2010,1,2,8,0), datetime.datetime(2010,1,3,7,0), datetime.datetime(2010,1,3,8,0), datetime.datetime(2010,1,3,9,0), datetime.datetime(2010,1,5,7,0), datetime.datetime(2010,1,5,9,0)])
df.index = df.index.hour
diurnal = df.stack().mean(level=0).tolist()

This is what I get:

Out:
 [20.666666666666668, 16.0, 11.333333333333334, 9.6666666666666661, 14.0, 13.0, 7.0]

This should be a simpler approach:

1) Access the hour using .hour attribute and assign this as the new index axis.

2) Stack the DF so that all columns fall under a single wholesome column. Perform Groupby wrt the hour labels (comprising of level=0 of the multi-index) and compute the mean.


df.index = df.index.hour                 
df.stack().mean(level=0).tolist()  
Out[20]:
[15.25, 12.2, 10.5]

另一种可能性:

diurnal = [np.nanmean(g) for _, g, in df.groupby(df.index.hour)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM