[英]Resample as frequency not losing data within a short timedelta
我有間隔10分鍾的測量數據。 有時候時間間隔是9分鍾59秒,或者10分鍾01秒,有時我缺少值,所以時間間隔是20分鍾。
我希望代碼執行以下操作:重新采樣10min的值(我已經實現了)。 事實是,以10:00分鍾(9分鍾59秒或10分鍾01秒)以外的間隔進行的測量丟失了,我想保留此數據。
這是我的測試代碼:
import pandas as pd
import numpy as np
df = pd.DataFrame(columns=('Datetime','V_L1','V_H3_L1','V_H3_L1_in_P'))
df['Datetime'] = ['01.01.2012 00:00:00', '01.01.2012 00:10:01', '01.01.2012 00:29:59','01.01.2012 00:50:00']
df['V_L1'] = [219,219.7,np.nan,220.3]
df['V_H3_L1'] = [3,1,2.5, np.nan]
df['Datetime'] = pd.to_datetime(df['Datetime'])
df.set_index('Datetime')
df = df.set_index('Datetime').resample('600S').asfreq()
輸出:
V_L1 V_H3_L1 V_H3_L1_in_P
Datetime
2012-01-01 00:00:00 219.0 3.0 NaN
2012-01-01 00:10:00 NaN NaN NaN
2012-01-01 00:20:00 NaN NaN NaN
2012-01-01 00:30:00 NaN NaN NaN
2012-01-01 00:40:00 NaN NaN NaN
2012-01-01 00:50:00 220.3 NaN NaN
希望的輸出:
V_L1 V_H3_L1 V_H3_L1_in_P
Datetime
2012-01-01 00:00:00 219.0 3.0 NaN
2012-01-01 00:10:00 219.7 1.0 NaN
2012-01-01 00:20:00 NaN NaN NaN
2012-01-01 00:30:00 NaN 2.5 NaN
2012-01-01 00:40:00 NaN NaN NaN
2012-01-01 00:50:00 220.3 NaN NaN
因此,我想保留數據,就像接受頻率設置(10min,600s)中的增量小於幾秒+或-5秒一樣。
df['Datetime'] = df['Datetime'].dt.round('min')
df = df.set_index('Datetime').resample('600S').asfreq()
將日期時間四舍五入到最接近的分鍾,然后可以set_index並重新采樣。
好吧,我寫了一個不是很漂亮的函數(我必須假設),但是它可以實現我想要的功能。 當我處理大量數據時,我認為這可能是一種安全的方法。 基本上,如果使用if,elif結構,該函數將檢查時間戳的分鍾部分,並根據其值確定舍入...(向上或向下),我很確定有更好的解決方法,請分享有一個。
因此,代碼為:
import datetime
def round_time(time):
if time.minute>=55:
if time.hour==23:
rounded = time-datetime.timedelta(hours=time.hour,minutes=time.minute,seconds=time.second)+datetime.timedelta(hours=time.hour+1,minutes=0,seconds=0)
else:
rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(hours=time.hour+1, minutes=0, seconds=0)
elif time.minute >=45:
rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=50)
elif time.minute >=35:
rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=40)
elif time.minute >=25:
rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=30)
elif time.minute >=15:
rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=20)
elif time.minute >=5:
rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=10)
elif time.minute >=0:
rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=0)
return rounded
df['Datetime'] = df['Datetime'].apply(lambda x: round_time(x))
df = df.set_index('Datetime').resample('600S').asfreq()
從如何將日期時間列四舍五入到最近的四分之一小時有一個想法
盡管上述線程上的解決方案無法解決10分鍾的值,但還是不錯的參考! (29分鍾仍然四舍五入為20,而不是我希望的值30)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.