[英]Python/Pandas merge issue with NaN data
我正在嘗試使用pd.concat使用以下代碼在熊貓中將兩個數據幀( df和df2 )合並為一個新的數據幀( df3 ):
df3 = pd.concat([df,df2])
這幾乎可以按照我想要的方式工作,但是卻造成了問題。
df包含當前日期的數據,索引是一個時間序列。 看起來像這樣:
Facility Servers PUE
2016-10-31 00:00:00 6.0 5.0 1.2
2016-10-31 00:30:00 7.0 5.0 1.4
2016-10-31 01:00:00 6.0 5.0 1.2
2016-10-31 01:30:00 6.0 5.0 1.2
2016-10-31 02:00:00 6.0 5.0 1.2
df2僅包含NaN數據,並且該索引是一個時間序列,其格式與df中的數據相對應,但從更早的日期開始並持續整整一年(即17520行,對應於365 * 48 30分鍾間隔)。 基本上看起來像這樣:
Facility Servers PUE
2016-10-01 00:00:00 NaN NaN NaN
2016-10-01 00:30:00 NaN NaN NaN
2016-10-01 01:00:00 NaN NaN NaN
2016-10-01 01:30:00 NaN NaN NaN
2016-10-01 02:00:00 NaN NaN NaN
2016-10-01 02:30:00 NaN NaN NaN
<continues to 17520 rows, i.e. one year of 30 minute time intervals>
當我申請時: df3 = pd.concat([df,df2])
然后運行df3.head() ,我得到以下信息:
Facility Servers PUE
2016-10-31 00:00:00 6.0 5.0 1.2
2016-10-31 00:30:00 7.0 5.0 1.4
2016-10-31 01:00:00 6.0 5.0 1.2
2016-10-31 01:30:00 6.0 5.0 1.2
2016-10-31 02:00:00 6.0 5.0 1.2
2016-10-31 02:30:00 NaN NaN NaN
2016-10-31 03:00:00 NaN NaN NaN
2016-10-31 03:30:00 NaN NaN NaN
<continues to the end of the year>
換句話說,該代碼似乎刪除了在df中的數據之前發生的時間間隔內的所有NaN數據。 誰能建議如何保存df2中的所有數據,僅在df的相應時間間隔內將其替換為數據?
我認為你需要reindex
由union
雙方的indexes
:
print (df2.index.union(df.index))
DatetimeIndex(['2016-10-01 00:00:00', '2016-10-01 00:30:00',
'2016-10-01 01:00:00', '2016-10-01 01:30:00',
'2016-10-01 02:00:00', '2016-10-01 02:30:00',
'2016-10-31 00:00:00', '2016-10-31 00:30:00',
'2016-10-31 01:00:00', '2016-10-31 01:30:00',
'2016-10-31 02:00:00'],
dtype='datetime64[ns]', freq=None)
df = df.reindex(df2.index.union(df.index))
print (df)
Facility Servers PUE
2016-10-01 00:00:00 NaN NaN NaN
2016-10-01 00:30:00 NaN NaN NaN
2016-10-01 01:00:00 NaN NaN NaN
2016-10-01 01:30:00 NaN NaN NaN
2016-10-01 02:00:00 NaN NaN NaN
2016-10-01 02:30:00 NaN NaN NaN
2016-10-31 00:00:00 6.0 5.0 1.2
2016-10-31 00:30:00 7.0 5.0 1.4
2016-10-31 01:00:00 6.0 5.0 1.2
2016-10-31 01:30:00 6.0 5.0 1.2
2016-10-31 02:00:00 6.0 5.0 1.2
使用Combine_first
result = df1.combine_first(df2)
如果左側DataFrame中缺少值,則結果將僅采用右側DataFrame中的值
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.