How do I run multiple functions on my aggregated pandas dataframe

Question

I have the following data for wind speed and wind direction taken over the course of a month in Salt Lake City. I want to group by the hour data were taken. For the data taken within that hour, I want to accomplish two things: (1) calculate mean wind speed (2) apply a function I have defined ("yamatrino") to all the wind_direction measurements taken within each hour.

        time                     Station_ID  wind_speed  wind_direction
    0   2019-08-01 00:00:00 UTC  WBB         3.48        96.1
    1   2019-08-01 00:00:00 UTC  UT215       6.54        141.4
    2   2019-08-01 00:00:00 UTC  MTMET       3.39        67.75
    3   2019-08-01 00:00:00 UTC  NAA         5.99        154.9
    4   2019-08-01 00:00:00 UTC  QHW         1.52        107

Below is the code I have written to (1) convert time data into a datetime format and (2) to create two columns with the mean wind speeds and yamatrino values for each hour of data.

df['time'] = pd.to_datetime(df['time'], format ='%Y-%m-%d %H:%M:%S UTC')

df.groupby(df['time'].dt.hour)['wind_direction', 'wind_speed'].agg([('yamatrino_value', lambda wind_direction: yamatrino(wind_direction)), ('hourly_velocity_mean', np.mean('wind_speed'))])

The error reads "TYPE ERROR: cannot perform reduce with flexible type" I am confused how to aggregate with more than one column of data.

Answer 1

Consider using a dictionary inside DataFrame.groupby.agg call to run separate aggregate functions on separate columns. And if your method expects one parameter, lambda is not needed.

df.groupby(df['time'].dt.hour).agg({'wind_direction': yamatrino, 
                                    'wind_speed': np.mean})

And as of v0.25.0+, you can name aggregate columns which may be what you intended with yamatrino_value and hourly_velocity_mean . However, you need to use named tuples with fields: ['column', 'aggfunc'] .

df.groupby(df['time'].dt.hour).agg(yamatrino_value = ('wind_direction', yamatrino), 
                                   hourly_velocity_mean = ('wind_speed', np.mean))

How do I run multiple functions on my aggregated pandas dataframe

Question

1 answers

solution1
1 ACCPTED 2020-08-19 15:57:06

How do I run multiple functions on my aggregated pandas dataframe

Question

1 answers

solution1 1 ACCPTED 2020-08-19 15:57:06

solution1
1 ACCPTED 2020-08-19 15:57:06