I am trying to use pd.concat to merge two data frames ( df and df2 ) into one new dataframe ( df3 ) in Pandas using the following code:
df3 = pd.concat([df, df2])
This almost works the way I want it to, but it creates an issue.
df contains data for the current date, and the index is a time series. It looks like this:
Facility Servers PUE
2016-10-31 00:00:00 6.0 5.0 1.2
2016-10-31 00:30:00 7.0 5.0 1.4
2016-10-31 01:00:00 6.0 5.0 1.2
2016-10-31 01:30:00 6.0 5.0 1.2
2016-10-31 02:00:00 6.0 5.0 1.2
df2 contains NaN data only and the index is a time series that corresponds in format to the one in df , but beginning on an earlier date and continuing for a full year (ie 17520 rows to correspond to 365 * 48 thirty minute intervals). It looks basically like this:
Facility Servers PUE
2016-10-01 00:00:00 NaN NaN NaN
2016-10-01 00:30:00 NaN NaN NaN
2016-10-01 01:00:00 NaN NaN NaN
2016-10-01 01:30:00 NaN NaN NaN
2016-10-01 02:00:00 NaN NaN NaN
2016-10-01 02:30:00 NaN NaN NaN
<continues to 17520 rows, i.e. one year of 30 minute time intervals>
When I apply: df3 = pd.concat([df, df2])
and then run df3.head() , I get the following:
Facility Servers PUE
2016-10-31 00:00:00 6.0 5.0 1.2
2016-10-31 00:30:00 7.0 5.0 1.4
2016-10-31 01:00:00 6.0 5.0 1.2
2016-10-31 01:30:00 6.0 5.0 1.2
2016-10-31 02:00:00 6.0 5.0 1.2
2016-10-31 02:30:00 NaN NaN NaN
2016-10-31 03:00:00 NaN NaN NaN
2016-10-31 03:30:00 NaN NaN NaN
<continues to the end of the year>
In other words, the code seems to remove all of the NaN data for the time intervals that occurred before the data in df . Can anyone advise on how to preserve all of the data in df2 , replacing it with data only for the corresponding time intervals from df ?
I think you need reindex
by union
of both indexes
:
print (df2.index.union(df.index))
DatetimeIndex(['2016-10-01 00:00:00', '2016-10-01 00:30:00',
'2016-10-01 01:00:00', '2016-10-01 01:30:00',
'2016-10-01 02:00:00', '2016-10-01 02:30:00',
'2016-10-31 00:00:00', '2016-10-31 00:30:00',
'2016-10-31 01:00:00', '2016-10-31 01:30:00',
'2016-10-31 02:00:00'],
dtype='datetime64[ns]', freq=None)
df = df.reindex(df2.index.union(df.index))
print (df)
Facility Servers PUE
2016-10-01 00:00:00 NaN NaN NaN
2016-10-01 00:30:00 NaN NaN NaN
2016-10-01 01:00:00 NaN NaN NaN
2016-10-01 01:30:00 NaN NaN NaN
2016-10-01 02:00:00 NaN NaN NaN
2016-10-01 02:30:00 NaN NaN NaN
2016-10-31 00:00:00 6.0 5.0 1.2
2016-10-31 00:30:00 7.0 5.0 1.4
2016-10-31 01:00:00 6.0 5.0 1.2
2016-10-31 01:30:00 6.0 5.0 1.2
2016-10-31 02:00:00 6.0 5.0 1.2
use combine_first
result = df1.combine_first(df2)
result will only takes values from the right DataFrame if they are missing in the left DataFrame
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.