根据重复的日期时间索引，用另一个数据帧中的值覆盖一个数据帧

Question

I want to update and overwrite the values of one dataframe with the values in another, based on the datetime index, for a repeated datetime index.我想根据日期时间索引更新和覆盖一个数据帧的值，并使用另一个数据帧中的值来重复日期时间索引。 This code illustrates my problem, I have given df1 crazy values for illustrative purposes:这段代码说明了我的问题，为了说明目的，我给出了 df1 疯狂值：

#import packages
import pandas as pd
import numpy as np

#create dataframes and indices
df = pd.DataFrame(np.random.randint(0,30,size=(10, 3)), columns=(['MeanT', 'MaxT', 'MinT']))
df1 = pd.DataFrame(np.random.randint(900,1000,size=(10, 3)), columns=(['MeanT', 'MaxT', 'MinT']))

df['Location'] =[2,2,3,3,4,4,5,5,6,6]
df1['Location'] =[2,2,3,3,4,4,5,5,6,6]

df.index = ["2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00"]
df1.index = ["2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00"]

df.index = pd.to_datetime(df.index)
df1.index = pd.to_datetime(df1.index)

Take a look at both dataframes, which shows dates 18th and 19th for df, and 19th and 20th for df1.看看这两个数据框，它显示了 df 的日期 18 日和 19 日，以及 df1 的 19 日和 20 日。

print(df)
                     MeanT  MaxT  MinT  Location
2020-05-18 12:00:00     28     0     9         2
2020-05-19 12:00:00     22     7    11         2
2020-05-18 12:00:00      2     7     7         3
2020-05-19 12:00:00     10    24    18         3
2020-05-18 12:00:00     10    12    25         4
2020-05-19 12:00:00     25     7    20         4
2020-05-18 12:00:00      1     8    11         5
2020-05-19 12:00:00     27    19    12         5
2020-05-18 12:00:00     25    10    26         6
2020-05-19 12:00:00     29    11    27         6

print(df1)
                     MeanT  MaxT  MinT  Location
2020-05-19 12:00:00    912   991   915         2
2020-05-20 12:00:00    936   917   965         2
2020-05-19 12:00:00    918   977   901         3
2020-05-20 12:00:00    974   971   927         3
2020-05-19 12:00:00    979   929   953         4
2020-05-20 12:00:00    988   955   939         4
2020-05-19 12:00:00    969   983   940         5
2020-05-20 12:00:00    902   904   916         5
2020-05-19 12:00:00    983   942   965         6
2020-05-20 12:00:00    928   994   933         6

I want to create a new dataframe which updates df with the values from df1, so the new df has values for the 18th from df, and the 19th and 20th from df1.我想创建一个新的数据框，它使用来自 df1 的值更新 df，因此新的 df 具有来自 df 的第 18 个值以及来自 df1 的第 19 个和第 20 个值。

I have tried using combine_first like so:我试过像这样使用 combine_first ：

df = df.set_index(df.groupby(level=0).cumcount(), append=True)
df1 = df1.set_index(df1.groupby(level=0).cumcount(), append=True)
 
df3 = df.combine_first(df1).sort_index(level=[1,0]).reset_index(level=1, drop=True)

which updates the dataframe, but doesn't overwrite the data for the 19th with values in df1.它更新数据帧，但不会用 df1 中的值覆盖第 19 个的数据。 It produces this output:它产生这个输出：

print(df3)
                     MeanT   MaxT   MinT  Location
2020-05-18 12:00:00   28.0    0.0    9.0       2.0
2020-05-19 12:00:00   22.0    7.0   11.0       2.0
2020-05-20 12:00:00  936.0  917.0  965.0       2.0
2020-05-18 12:00:00    2.0    7.0    7.0       3.0
2020-05-19 12:00:00   10.0   24.0   18.0       3.0
2020-05-20 12:00:00  974.0  971.0  927.0       3.0
2020-05-18 12:00:00   10.0   12.0   25.0       4.0
2020-05-19 12:00:00   25.0    7.0   20.0       4.0
2020-05-20 12:00:00  988.0  955.0  939.0       4.0
2020-05-18 12:00:00    1.0    8.0   11.0       5.0
2020-05-19 12:00:00   27.0   19.0   12.0       5.0
2020-05-20 12:00:00  902.0  904.0  916.0       5.0
2020-05-18 12:00:00   25.0   10.0   26.0       6.0
2020-05-19 12:00:00   29.0   11.0   27.0       6.0
2020-05-20 12:00:00  928.0  994.0  933.0       6.0

So the values for the 18th and the 20th are correct, but the values for the 19th are still from df.所以第 18 个和第 20 个的值是正确的，但第 19 个的值仍然来自 df。 I want the values from df to be overwritten with those in df1.我希望 df 中的值被 df1 中的值覆盖。 Please help!请帮忙！

Answer 1

you just need to use combine_first backwards.你只需要向后使用combine_first 。 We can also use 'Location' as index instead groupby.cumcount我们也可以使用'Location'作为索引而不是groupby.cumcount

df3 = (df1.set_index('Location', append=True)
          .combine_first(df.set_index('Location', append=True))
          .reset_index(level='Location')
          .reindex(columns=df.columns)
          .sort_values('Location'))

print(df3)

                     Location  MeanT   MaxT   MinT
2020-05-18-12:00:00         2   28.0    0.0    9.0
2020-05-19-12:00:00         2  912.0  991.0  915.0
2020-05-20-12:00:00         2  936.0  917.0  965.0
2020-05-18-12:00:00         3    2.0    7.0    7.0
2020-05-19-12:00:00         3  918.0  977.0  901.0
2020-05-20-12:00:00         3  974.0  971.0  927.0
2020-05-18-12:00:00         4   10.0   12.0   25.0
2020-05-19-12:00:00         4  979.0  929.0  953.0
2020-05-20-12:00:00         4  988.0  955.0  939.0
2020-05-18-12:00:00         5    1.0    8.0   11.0
2020-05-19-12:00:00         5  969.0  983.0  940.0
2020-05-20-12:00:00         5  902.0  904.0  916.0
2020-05-18-12:00:00         6   25.0   10.0   26.0
2020-05-19-12:00:00         6  983.0  942.0  965.0
2020-05-20-12:00:00         6  928.0  994.0  933.0

根据重复的日期时间索引，用另一个数据帧中的值覆盖一个数据帧

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-10-29 13:07:10

根据重复的日期时间索引，用另一个数据帧中的值覆盖一个数据帧

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-10-29 13:07:10

解决方案1
1 已采纳 2020-10-29 13:07:10