简体   繁体   English

如何根据另一列的最大值对一列进行重采样?

[英]How to resample a column based on the maximum value of another column?

Say, I have a surface wind dataset with an irregular temporal step, looking like below.比如说,我有一个具有不规则时间步长的地表风数据集,如下所示。 The actual dataset has many other columns and thousands of rows.实际数据集有许多其他列和数千行。

Time, Speed, Direction
2023-1-1 01:00:00, 6, 90
2023-1-1 02:00:00, 6, 70
2023-1-1 03:00:00, 9, 70
2023-1-1 04:00:00, 6, 230
2023-1-1 06:00:00, 2, 320
2023-1-1 08:00:00, 2, 100
2023-1-1 11:00:00, 3, 140
2023-1-1 15:00:00, 12, 10
2023-1-1 16:00:00, 13, 20
2023-1-1 17:00:00, 15, 60
2023-1-1 18:00:00, 10, 80

I'd like to resample the data into a regular time step (00 03 06 09 12 15 18 21) by calculating the maximum Speed, and the Direction corresponding to the max speed.我想通过计算最大速度和与最大速度对应的方向,将数据重新采样为固定时间步长 (00 03 06 09 12 15 18 21)。 How can this be done?如何才能做到这一点? I am figuring out something like this, but it does not work.我正在弄清楚这样的事情,但它不起作用。

df3h = df.resample('3H').agg({ # 3H Does not work if the time series donot start at 00:00
'Speed':'max'
'Direction': lambda x, x.loc[x.Speed.idxmax(),'Direction'] # This Won't Work!
})

You can do it using groupby and idxmax before creating a new dataframe. First use floor on the Time column to get the groups of 3h.您可以在创建新的 dataframe 之前使用groupbyidxmax来完成。首先在时间列上使用floor来获取 3h 的组。

# create 3h bins
_3h = df['Time'].dt.floor('3H')
print(_3h) # if Time is index, then do df.index.floor('3H')
# 0    2023-01-01 00:00:00
# 1    2023-01-01 00:00:00
# 2    2023-01-01 03:00:00
# 3    2023-01-01 03:00:00
# 4    2023-01-01 06:00:00
# ...

#use it to groupby and get the index of the max Speed with idxmax
_idxmax = df.groupby(_3h)['Speed'].idxmax()
print(_idxmax)
# Time
# 2023-01-01 00:00:00     0
# 2023-01-01 03:00:00     2
# 2023-01-01 06:00:00     4
# ...

# create the result dataframe
new_df = (
    df.loc[_idxmax, ['Speed','Direction']]
      .set_index(_idxmax.index)
      # in case a 3h bin is missing like 12:00 in your example
      .reindex(pd.Index(
          pd.date_range(_3h.min(), _3h.max(), freq='3h'), 
          name='Time'))
      # if you want Time as column
      .reset_index()
)
print(new_df)
#                  Time  Speed  Direction
# 0 2023-01-01 00:00:00    6.0       90.0
# 1 2023-01-01 03:00:00    9.0       70.0
# 2 2023-01-01 06:00:00    2.0      320.0
# 3 2023-01-01 09:00:00    3.0      140.0
# 4 2023-01-01 12:00:00    NaN        NaN
# 5 2023-01-01 15:00:00   15.0       60.0
# 6 2023-01-01 18:00:00   10.0       80.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据另一个列值重新采样和聚合数据 - Resample and aggregate data according to another column value 如何根据另一列的每组最大值将一列的标签分配给新列? 熊猫变形 - How to assign the label of one column to the new one based on per group maximum value of another column ? panda tranform 在DataFrame.groupby的情况下如何基于另一列的最大值获取列的值 - How to get value of a column based on the maximum of another column in case of DataFrame.groupby 如何根据一列重新采样 df 并添加另一列的值? - How to resample df based on one column and add the values from another column? 如何根据 Pyspark 中同一列中的最大值替换列中的值? - How to replace value in a column based on maximum value in same column in Pyspark? 如何为列中的每个不同值转发重采样 - How to forward resample for each different value in a column 如何按ID重新采样一列 - How to resample a column by id 如何根据列的数据对时间序列进行重新采样/重新索引/分组? - How to resample/reindex/groupby a time series based on a column's data? 如何基于另一列值获取一列的值 - How to get the value of one column based on another column value 如何根据字符串而不是时间格式对列进行分组/重新采样,如 pandas 重新采样中所知 - How to group / resample a column based on strings rather than timeformat as known from pandas resample
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM