I have the following data for wind speed and wind direction taken over the course of a month in Salt Lake City. I want to group by the hour data were taken. For the data taken within that hour, I want to accomplish two things: (1) calculate mean wind speed (2) apply a function I have defined ("yamatrino") to all the wind_direction measurements taken within each hour.
time Station_ID wind_speed wind_direction
0 2019-08-01 00:00:00 UTC WBB 3.48 96.1
1 2019-08-01 00:00:00 UTC UT215 6.54 141.4
2 2019-08-01 00:00:00 UTC MTMET 3.39 67.75
3 2019-08-01 00:00:00 UTC NAA 5.99 154.9
4 2019-08-01 00:00:00 UTC QHW 1.52 107
Below is the code I have written to (1) convert time data into a datetime format and (2) to create two columns with the mean wind speeds and yamatrino values for each hour of data.
df['time'] = pd.to_datetime(df['time'], format ='%Y-%m-%d %H:%M:%S UTC')
df.groupby(df['time'].dt.hour)['wind_direction', 'wind_speed'].agg([('yamatrino_value', lambda wind_direction: yamatrino(wind_direction)), ('hourly_velocity_mean', np.mean('wind_speed'))])
The error reads "TYPE ERROR: cannot perform reduce with flexible type" I am confused how to aggregate with more than one column of data.
Consider using a dictionary inside DataFrame.groupby.agg
call to run separate aggregate functions on separate columns. And if your method expects one parameter, lambda
is not needed.
df.groupby(df['time'].dt.hour).agg({'wind_direction': yamatrino,
'wind_speed': np.mean})
And as of v0.25.0+, you can name aggregate columns which may be what you intended with yamatrino_value and hourly_velocity_mean . However, you need to use named tuples with fields: ['column', 'aggfunc']
.
df.groupby(df['time'].dt.hour).agg(yamatrino_value = ('wind_direction', yamatrino),
hourly_velocity_mean = ('wind_speed', np.mean))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.