[英]Overwrite one dataframe with values from another dataframe, based on repeated datetime index
I want to update and overwrite the values of one dataframe with the values in another, based on the datetime index, for a repeated datetime index.我想根据日期时间索引更新和覆盖一个数据帧的值,并使用另一个数据帧中的值来重复日期时间索引。 This code illustrates my problem, I have given df1 crazy values for illustrative purposes:这段代码说明了我的问题,为了说明目的,我给出了 df1 疯狂值:
#import packages
import pandas as pd
import numpy as np
#create dataframes and indices
df = pd.DataFrame(np.random.randint(0,30,size=(10, 3)), columns=(['MeanT', 'MaxT', 'MinT']))
df1 = pd.DataFrame(np.random.randint(900,1000,size=(10, 3)), columns=(['MeanT', 'MaxT', 'MinT']))
df['Location'] =[2,2,3,3,4,4,5,5,6,6]
df1['Location'] =[2,2,3,3,4,4,5,5,6,6]
df.index = ["2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00"]
df1.index = ["2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00"]
df.index = pd.to_datetime(df.index)
df1.index = pd.to_datetime(df1.index)
Take a look at both dataframes, which shows dates 18th and 19th for df, and 19th and 20th for df1.看看这两个数据框,它显示了 df 的日期 18 日和 19 日,以及 df1 的 19 日和 20 日。
print(df)
MeanT MaxT MinT Location
2020-05-18 12:00:00 28 0 9 2
2020-05-19 12:00:00 22 7 11 2
2020-05-18 12:00:00 2 7 7 3
2020-05-19 12:00:00 10 24 18 3
2020-05-18 12:00:00 10 12 25 4
2020-05-19 12:00:00 25 7 20 4
2020-05-18 12:00:00 1 8 11 5
2020-05-19 12:00:00 27 19 12 5
2020-05-18 12:00:00 25 10 26 6
2020-05-19 12:00:00 29 11 27 6
print(df1)
MeanT MaxT MinT Location
2020-05-19 12:00:00 912 991 915 2
2020-05-20 12:00:00 936 917 965 2
2020-05-19 12:00:00 918 977 901 3
2020-05-20 12:00:00 974 971 927 3
2020-05-19 12:00:00 979 929 953 4
2020-05-20 12:00:00 988 955 939 4
2020-05-19 12:00:00 969 983 940 5
2020-05-20 12:00:00 902 904 916 5
2020-05-19 12:00:00 983 942 965 6
2020-05-20 12:00:00 928 994 933 6
I want to create a new dataframe which updates df with the values from df1, so the new df has values for the 18th from df, and the 19th and 20th from df1.我想创建一个新的数据框,它使用来自 df1 的值更新 df,因此新的 df 具有来自 df 的第 18 个值以及来自 df1 的第 19 个和第 20 个值。
I have tried using combine_first like so:我试过像这样使用 combine_first :
df = df.set_index(df.groupby(level=0).cumcount(), append=True)
df1 = df1.set_index(df1.groupby(level=0).cumcount(), append=True)
df3 = df.combine_first(df1).sort_index(level=[1,0]).reset_index(level=1, drop=True)
which updates the dataframe, but doesn't overwrite the data for the 19th with values in df1.它更新数据帧,但不会用 df1 中的值覆盖第 19 个的数据。 It produces this output:它产生这个输出:
print(df3)
MeanT MaxT MinT Location
2020-05-18 12:00:00 28.0 0.0 9.0 2.0
2020-05-19 12:00:00 22.0 7.0 11.0 2.0
2020-05-20 12:00:00 936.0 917.0 965.0 2.0
2020-05-18 12:00:00 2.0 7.0 7.0 3.0
2020-05-19 12:00:00 10.0 24.0 18.0 3.0
2020-05-20 12:00:00 974.0 971.0 927.0 3.0
2020-05-18 12:00:00 10.0 12.0 25.0 4.0
2020-05-19 12:00:00 25.0 7.0 20.0 4.0
2020-05-20 12:00:00 988.0 955.0 939.0 4.0
2020-05-18 12:00:00 1.0 8.0 11.0 5.0
2020-05-19 12:00:00 27.0 19.0 12.0 5.0
2020-05-20 12:00:00 902.0 904.0 916.0 5.0
2020-05-18 12:00:00 25.0 10.0 26.0 6.0
2020-05-19 12:00:00 29.0 11.0 27.0 6.0
2020-05-20 12:00:00 928.0 994.0 933.0 6.0
So the values for the 18th and the 20th are correct, but the values for the 19th are still from df.所以第 18 个和第 20 个的值是正确的,但第 19 个的值仍然来自 df。 I want the values from df to be overwritten with those in df1.我希望 df 中的值被 df1 中的值覆盖。 Please help!请帮忙!
you just need to use combine_first
backwards.你只需要向后使用combine_first
。 We can also use 'Location'
as index instead groupby.cumcount
我们也可以使用'Location'
作为索引而不是groupby.cumcount
df3 = (df1.set_index('Location', append=True)
.combine_first(df.set_index('Location', append=True))
.reset_index(level='Location')
.reindex(columns=df.columns)
.sort_values('Location'))
print(df3)
Location MeanT MaxT MinT
2020-05-18-12:00:00 2 28.0 0.0 9.0
2020-05-19-12:00:00 2 912.0 991.0 915.0
2020-05-20-12:00:00 2 936.0 917.0 965.0
2020-05-18-12:00:00 3 2.0 7.0 7.0
2020-05-19-12:00:00 3 918.0 977.0 901.0
2020-05-20-12:00:00 3 974.0 971.0 927.0
2020-05-18-12:00:00 4 10.0 12.0 25.0
2020-05-19-12:00:00 4 979.0 929.0 953.0
2020-05-20-12:00:00 4 988.0 955.0 939.0
2020-05-18-12:00:00 5 1.0 8.0 11.0
2020-05-19-12:00:00 5 969.0 983.0 940.0
2020-05-20-12:00:00 5 902.0 904.0 916.0
2020-05-18-12:00:00 6 25.0 10.0 26.0
2020-05-19-12:00:00 6 983.0 942.0 965.0
2020-05-20-12:00:00 6 928.0 994.0 933.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.