如何根据另一列的最大值对一列进行重采样？

Question

Say, I have a surface wind dataset with an irregular temporal step, looking like below.比如说，我有一个具有不规则时间步长的地表风数据集，如下所示。 The actual dataset has many other columns and thousands of rows.实际数据集有许多其他列和数千行。

Time, Speed, Direction
2023-1-1 01:00:00, 6, 90
2023-1-1 02:00:00, 6, 70
2023-1-1 03:00:00, 9, 70
2023-1-1 04:00:00, 6, 230
2023-1-1 06:00:00, 2, 320
2023-1-1 08:00:00, 2, 100
2023-1-1 11:00:00, 3, 140
2023-1-1 15:00:00, 12, 10
2023-1-1 16:00:00, 13, 20
2023-1-1 17:00:00, 15, 60
2023-1-1 18:00:00, 10, 80

I'd like to resample the data into a regular time step (00 03 06 09 12 15 18 21) by calculating the maximum Speed, and the Direction corresponding to the max speed.我想通过计算最大速度和与最大速度对应的方向，将数据重新采样为固定时间步长 (00 03 06 09 12 15 18 21)。 How can this be done?如何才能做到这一点？ I am figuring out something like this, but it does not work.我正在弄清楚这样的事情，但它不起作用。

df3h = df.resample('3H').agg({ # 3H Does not work if the time series donot start at 00:00
'Speed':'max'
'Direction': lambda x, x.loc[x.Speed.idxmax(),'Direction'] # This Won't Work!
})

Answer 1

You can do it using groupby and idxmax before creating a new dataframe. First use floor on the Time column to get the groups of 3h.您可以在创建新的 dataframe 之前使用groupby和idxmax来完成。首先在时间列上使用floor来获取 3h 的组。

# create 3h bins
_3h = df['Time'].dt.floor('3H')
print(_3h) # if Time is index, then do df.index.floor('3H')
# 0    2023-01-01 00:00:00
# 1    2023-01-01 00:00:00
# 2    2023-01-01 03:00:00
# 3    2023-01-01 03:00:00
# 4    2023-01-01 06:00:00
# ...

#use it to groupby and get the index of the max Speed with idxmax
_idxmax = df.groupby(_3h)['Speed'].idxmax()
print(_idxmax)
# Time
# 2023-01-01 00:00:00     0
# 2023-01-01 03:00:00     2
# 2023-01-01 06:00:00     4
# ...

# create the result dataframe
new_df = (
    df.loc[_idxmax, ['Speed','Direction']]
      .set_index(_idxmax.index)
      # in case a 3h bin is missing like 12:00 in your example
      .reindex(pd.Index(
          pd.date_range(_3h.min(), _3h.max(), freq='3h'), 
          name='Time'))
      # if you want Time as column
      .reset_index()
)
print(new_df)
#                  Time  Speed  Direction
# 0 2023-01-01 00:00:00    6.0       90.0
# 1 2023-01-01 03:00:00    9.0       70.0
# 2 2023-01-01 06:00:00    2.0      320.0
# 3 2023-01-01 09:00:00    3.0      140.0
# 4 2023-01-01 12:00:00    NaN        NaN
# 5 2023-01-01 15:00:00   15.0       60.0
# 6 2023-01-01 18:00:00   10.0       80.0

如何根据另一列的最大值对一列进行重采样？

问题描述

1 个解决方案

解决方案1
4 2023-01-06 14:59:32

如何根据另一列的最大值对一列进行重采样？

问题描述

1 个解决方案

解决方案1 4 2023-01-06 14:59:32

解决方案1
4 2023-01-06 14:59:32