I want to update and overwrite the values of one dataframe with the values in another, based on the datetime index, for a repeated datetime index. This code illustrates my problem, I have given df1 crazy values for illustrative purposes:
#import packages
import pandas as pd
import numpy as np
#create dataframes and indices
df = pd.DataFrame(np.random.randint(0,30,size=(10, 3)), columns=(['MeanT', 'MaxT', 'MinT']))
df1 = pd.DataFrame(np.random.randint(900,1000,size=(10, 3)), columns=(['MeanT', 'MaxT', 'MinT']))
df['Location'] =[2,2,3,3,4,4,5,5,6,6]
df1['Location'] =[2,2,3,3,4,4,5,5,6,6]
df.index = ["2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00"]
df1.index = ["2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00"]
df.index = pd.to_datetime(df.index)
df1.index = pd.to_datetime(df1.index)
Take a look at both dataframes, which shows dates 18th and 19th for df, and 19th and 20th for df1.
print(df)
MeanT MaxT MinT Location
2020-05-18 12:00:00 28 0 9 2
2020-05-19 12:00:00 22 7 11 2
2020-05-18 12:00:00 2 7 7 3
2020-05-19 12:00:00 10 24 18 3
2020-05-18 12:00:00 10 12 25 4
2020-05-19 12:00:00 25 7 20 4
2020-05-18 12:00:00 1 8 11 5
2020-05-19 12:00:00 27 19 12 5
2020-05-18 12:00:00 25 10 26 6
2020-05-19 12:00:00 29 11 27 6
print(df1)
MeanT MaxT MinT Location
2020-05-19 12:00:00 912 991 915 2
2020-05-20 12:00:00 936 917 965 2
2020-05-19 12:00:00 918 977 901 3
2020-05-20 12:00:00 974 971 927 3
2020-05-19 12:00:00 979 929 953 4
2020-05-20 12:00:00 988 955 939 4
2020-05-19 12:00:00 969 983 940 5
2020-05-20 12:00:00 902 904 916 5
2020-05-19 12:00:00 983 942 965 6
2020-05-20 12:00:00 928 994 933 6
I want to create a new dataframe which updates df with the values from df1, so the new df has values for the 18th from df, and the 19th and 20th from df1.
I have tried using combine_first like so:
df = df.set_index(df.groupby(level=0).cumcount(), append=True)
df1 = df1.set_index(df1.groupby(level=0).cumcount(), append=True)
df3 = df.combine_first(df1).sort_index(level=[1,0]).reset_index(level=1, drop=True)
which updates the dataframe, but doesn't overwrite the data for the 19th with values in df1. It produces this output:
print(df3)
MeanT MaxT MinT Location
2020-05-18 12:00:00 28.0 0.0 9.0 2.0
2020-05-19 12:00:00 22.0 7.0 11.0 2.0
2020-05-20 12:00:00 936.0 917.0 965.0 2.0
2020-05-18 12:00:00 2.0 7.0 7.0 3.0
2020-05-19 12:00:00 10.0 24.0 18.0 3.0
2020-05-20 12:00:00 974.0 971.0 927.0 3.0
2020-05-18 12:00:00 10.0 12.0 25.0 4.0
2020-05-19 12:00:00 25.0 7.0 20.0 4.0
2020-05-20 12:00:00 988.0 955.0 939.0 4.0
2020-05-18 12:00:00 1.0 8.0 11.0 5.0
2020-05-19 12:00:00 27.0 19.0 12.0 5.0
2020-05-20 12:00:00 902.0 904.0 916.0 5.0
2020-05-18 12:00:00 25.0 10.0 26.0 6.0
2020-05-19 12:00:00 29.0 11.0 27.0 6.0
2020-05-20 12:00:00 928.0 994.0 933.0 6.0
So the values for the 18th and the 20th are correct, but the values for the 19th are still from df. I want the values from df to be overwritten with those in df1. Please help!
you just need to use combine_first
backwards. We can also use 'Location'
as index instead groupby.cumcount
df3 = (df1.set_index('Location', append=True)
.combine_first(df.set_index('Location', append=True))
.reset_index(level='Location')
.reindex(columns=df.columns)
.sort_values('Location'))
print(df3)
Location MeanT MaxT MinT
2020-05-18-12:00:00 2 28.0 0.0 9.0
2020-05-19-12:00:00 2 912.0 991.0 915.0
2020-05-20-12:00:00 2 936.0 917.0 965.0
2020-05-18-12:00:00 3 2.0 7.0 7.0
2020-05-19-12:00:00 3 918.0 977.0 901.0
2020-05-20-12:00:00 3 974.0 971.0 927.0
2020-05-18-12:00:00 4 10.0 12.0 25.0
2020-05-19-12:00:00 4 979.0 929.0 953.0
2020-05-20-12:00:00 4 988.0 955.0 939.0
2020-05-18-12:00:00 5 1.0 8.0 11.0
2020-05-19-12:00:00 5 969.0 983.0 940.0
2020-05-20-12:00:00 5 902.0 904.0 916.0
2020-05-18-12:00:00 6 25.0 10.0 26.0
2020-05-19-12:00:00 6 983.0 942.0 965.0
2020-05-20-12:00:00 6 928.0 994.0 933.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.