[英]How to resample a column based on the maximum value of another column?
Say, I have a surface wind dataset with an irregular temporal step, looking like below.比如说,我有一个具有不规则时间步长的地表风数据集,如下所示。 The actual dataset has many other columns and thousands of rows.
实际数据集有许多其他列和数千行。
Time, Speed, Direction
2023-1-1 01:00:00, 6, 90
2023-1-1 02:00:00, 6, 70
2023-1-1 03:00:00, 9, 70
2023-1-1 04:00:00, 6, 230
2023-1-1 06:00:00, 2, 320
2023-1-1 08:00:00, 2, 100
2023-1-1 11:00:00, 3, 140
2023-1-1 15:00:00, 12, 10
2023-1-1 16:00:00, 13, 20
2023-1-1 17:00:00, 15, 60
2023-1-1 18:00:00, 10, 80
I'd like to resample the data into a regular time step (00 03 06 09 12 15 18 21) by calculating the maximum Speed, and the Direction corresponding to the max speed.我想通过计算最大速度和与最大速度对应的方向,将数据重新采样为固定时间步长 (00 03 06 09 12 15 18 21)。 How can this be done?
如何才能做到这一点? I am figuring out something like this, but it does not work.
我正在弄清楚这样的事情,但它不起作用。
df3h = df.resample('3H').agg({ # 3H Does not work if the time series donot start at 00:00
'Speed':'max'
'Direction': lambda x, x.loc[x.Speed.idxmax(),'Direction'] # This Won't Work!
})
You can do it using groupby
and idxmax
before creating a new dataframe. First use floor
on the Time column to get the groups of 3h.您可以在创建新的 dataframe 之前使用
groupby
和idxmax
来完成。首先在时间列上使用floor
来获取 3h 的组。
# create 3h bins
_3h = df['Time'].dt.floor('3H')
print(_3h) # if Time is index, then do df.index.floor('3H')
# 0 2023-01-01 00:00:00
# 1 2023-01-01 00:00:00
# 2 2023-01-01 03:00:00
# 3 2023-01-01 03:00:00
# 4 2023-01-01 06:00:00
# ...
#use it to groupby and get the index of the max Speed with idxmax
_idxmax = df.groupby(_3h)['Speed'].idxmax()
print(_idxmax)
# Time
# 2023-01-01 00:00:00 0
# 2023-01-01 03:00:00 2
# 2023-01-01 06:00:00 4
# ...
# create the result dataframe
new_df = (
df.loc[_idxmax, ['Speed','Direction']]
.set_index(_idxmax.index)
# in case a 3h bin is missing like 12:00 in your example
.reindex(pd.Index(
pd.date_range(_3h.min(), _3h.max(), freq='3h'),
name='Time'))
# if you want Time as column
.reset_index()
)
print(new_df)
# Time Speed Direction
# 0 2023-01-01 00:00:00 6.0 90.0
# 1 2023-01-01 03:00:00 9.0 70.0
# 2 2023-01-01 06:00:00 2.0 320.0
# 3 2023-01-01 09:00:00 3.0 140.0
# 4 2023-01-01 12:00:00 NaN NaN
# 5 2023-01-01 15:00:00 15.0 60.0
# 6 2023-01-01 18:00:00 10.0 80.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.