[英]How to merge two time series dataframes with different end dates and keep the longer end date
我有兩個采樣頻率相同但結束日期不同的時間序列。 我想將它們合並為一個並保留總時間范圍而不是交集。 將數據保留在交集 NaN 之外。
我試過了:
df_to_merge= [df1, df2]
df_merged = reduce(lambda left,right: pd.merge(left,right, on='timestamp'), df_to_merge)
數據:
df1
timestamp col1
2010-10-10 00:00 10
2010-10-10 00:01 15
...
2010-10-15 00:00 10
df2
timestamp col2
2010-10-07 00:00 20
2010-10-10 00:01 25
...
2010-10-18 00:00 20
期望的結果:
timestamp col1 col2
2010-10-07 00:00 NaN 20
2010-10-07 00:01 NaN 25
...
2010-10-10 00:01 10 30
2010-10-15 00:00 10 40
..
2010-10-18 00:00 NaN 20
您可以執行連接操作:
df_merged = df1.join(df2,how='right')
通過使用right
,您可以確保保留右側(更長的 df)的所有值。
例如:
df1 = pd.DataFrame({'timestamp':pd.to_datetime(pd.Series(['2020-10-10 23:32',
'2020-10-13 23:28'])),
'col1':[5,8]})
df1 = df1.set_index('timestamp').resample('1d').fillna(method='ffill')
col1
timestamp
2020-10-10 NaN
2020-10-11 5.0
2020-10-12 5.0
2020-10-13 5.0
和
df2 = pd.DataFrame({'timestamp':pd.to_datetime(pd.Series(['2020-10-08 23:32',
'2020-10-15 23:28'])),
'col2':[50,80]})
df2 = df2.set_index('timestamp').resample('1d').fillna(method='ffill')
col1
timestamp
2020-10-08 NaN
2020-10-09 50.0
2020-10-10 50.0
2020-10-11 50.0
2020-10-12 50.0
2020-10-13 50.0
2020-10-14 50.0
2020-10-15 50.0
返回:
col1 col2
timestamp
2020-10-08 NaN NaN
2020-10-09 NaN 50.0
2020-10-10 NaN 50.0
2020-10-11 5.0 50.0
2020-10-12 5.0 50.0
2020-10-13 5.0 50.0
2020-10-14 NaN 50.0
2020-10-15 NaN 50.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.