[英]How to resample a column based on the maximum value of another column?
比如說,我有一個具有不規則時間步長的地表風數據集,如下所示。 實際數據集有許多其他列和數千行。
Time, Speed, Direction
2023-1-1 01:00:00, 6, 90
2023-1-1 02:00:00, 6, 70
2023-1-1 03:00:00, 9, 70
2023-1-1 04:00:00, 6, 230
2023-1-1 06:00:00, 2, 320
2023-1-1 08:00:00, 2, 100
2023-1-1 11:00:00, 3, 140
2023-1-1 15:00:00, 12, 10
2023-1-1 16:00:00, 13, 20
2023-1-1 17:00:00, 15, 60
2023-1-1 18:00:00, 10, 80
我想通過計算最大速度和與最大速度對應的方向,將數據重新采樣為固定時間步長 (00 03 06 09 12 15 18 21)。 如何才能做到這一點? 我正在弄清楚這樣的事情,但它不起作用。
df3h = df.resample('3H').agg({ # 3H Does not work if the time series donot start at 00:00
'Speed':'max'
'Direction': lambda x, x.loc[x.Speed.idxmax(),'Direction'] # This Won't Work!
})
您可以在創建新的 dataframe 之前使用groupby
和idxmax
來完成。首先在時間列上使用floor
來獲取 3h 的組。
# create 3h bins
_3h = df['Time'].dt.floor('3H')
print(_3h) # if Time is index, then do df.index.floor('3H')
# 0 2023-01-01 00:00:00
# 1 2023-01-01 00:00:00
# 2 2023-01-01 03:00:00
# 3 2023-01-01 03:00:00
# 4 2023-01-01 06:00:00
# ...
#use it to groupby and get the index of the max Speed with idxmax
_idxmax = df.groupby(_3h)['Speed'].idxmax()
print(_idxmax)
# Time
# 2023-01-01 00:00:00 0
# 2023-01-01 03:00:00 2
# 2023-01-01 06:00:00 4
# ...
# create the result dataframe
new_df = (
df.loc[_idxmax, ['Speed','Direction']]
.set_index(_idxmax.index)
# in case a 3h bin is missing like 12:00 in your example
.reindex(pd.Index(
pd.date_range(_3h.min(), _3h.max(), freq='3h'),
name='Time'))
# if you want Time as column
.reset_index()
)
print(new_df)
# Time Speed Direction
# 0 2023-01-01 00:00:00 6.0 90.0
# 1 2023-01-01 03:00:00 9.0 70.0
# 2 2023-01-01 06:00:00 2.0 320.0
# 3 2023-01-01 09:00:00 3.0 140.0
# 4 2023-01-01 12:00:00 NaN NaN
# 5 2023-01-01 15:00:00 15.0 60.0
# 6 2023-01-01 18:00:00 10.0 80.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.