简体   繁体   English

根据重复的日期时间索引,用另一个数据帧中的值覆盖一个数据帧

[英]Overwrite one dataframe with values from another dataframe, based on repeated datetime index

I want to update and overwrite the values of one dataframe with the values in another, based on the datetime index, for a repeated datetime index.我想根据日期时间索引更新和覆盖一个数据帧的值,并使用另一个数据帧中的值来重复日期时间索引。 This code illustrates my problem, I have given df1 crazy values for illustrative purposes:这段代码说明了我的问题,为了说明目的,我给出了 df1 疯狂值:

#import packages
import pandas as pd
import numpy as np

#create dataframes and indices
df = pd.DataFrame(np.random.randint(0,30,size=(10, 3)), columns=(['MeanT', 'MaxT', 'MinT']))
df1 = pd.DataFrame(np.random.randint(900,1000,size=(10, 3)), columns=(['MeanT', 'MaxT', 'MinT']))

df['Location'] =[2,2,3,3,4,4,5,5,6,6]
df1['Location'] =[2,2,3,3,4,4,5,5,6,6]

df.index = ["2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00"]
df1.index = ["2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00"]

df.index = pd.to_datetime(df.index)
df1.index = pd.to_datetime(df1.index)

Take a look at both dataframes, which shows dates 18th and 19th for df, and 19th and 20th for df1.看看这两个数据框,它显示了 df 的日期 18 日和 19 日,以及 df1 的 19 日和 20 日。

print(df)
                     MeanT  MaxT  MinT  Location
2020-05-18 12:00:00     28     0     9         2
2020-05-19 12:00:00     22     7    11         2
2020-05-18 12:00:00      2     7     7         3
2020-05-19 12:00:00     10    24    18         3
2020-05-18 12:00:00     10    12    25         4
2020-05-19 12:00:00     25     7    20         4
2020-05-18 12:00:00      1     8    11         5
2020-05-19 12:00:00     27    19    12         5
2020-05-18 12:00:00     25    10    26         6
2020-05-19 12:00:00     29    11    27         6

print(df1)
                     MeanT  MaxT  MinT  Location
2020-05-19 12:00:00    912   991   915         2
2020-05-20 12:00:00    936   917   965         2
2020-05-19 12:00:00    918   977   901         3
2020-05-20 12:00:00    974   971   927         3
2020-05-19 12:00:00    979   929   953         4
2020-05-20 12:00:00    988   955   939         4
2020-05-19 12:00:00    969   983   940         5
2020-05-20 12:00:00    902   904   916         5
2020-05-19 12:00:00    983   942   965         6
2020-05-20 12:00:00    928   994   933         6

I want to create a new dataframe which updates df with the values from df1, so the new df has values for the 18th from df, and the 19th and 20th from df1.我想创建一个新的数据框,它使用来自 df1 的值更新 df,因此新的 df 具有来自 df 的第 18 个值以及来自 df1 的第 19 个和第 20 个值。

I have tried using combine_first like so:我试过像这样使用 combine_first :

df = df.set_index(df.groupby(level=0).cumcount(), append=True)
df1 = df1.set_index(df1.groupby(level=0).cumcount(), append=True)
 
df3 = df.combine_first(df1).sort_index(level=[1,0]).reset_index(level=1, drop=True)

which updates the dataframe, but doesn't overwrite the data for the 19th with values in df1.它更新数据帧,但不会用 df1 中的值覆盖第 19 个的数据。 It produces this output:它产生这个输出:

print(df3)
                     MeanT   MaxT   MinT  Location
2020-05-18 12:00:00   28.0    0.0    9.0       2.0
2020-05-19 12:00:00   22.0    7.0   11.0       2.0
2020-05-20 12:00:00  936.0  917.0  965.0       2.0
2020-05-18 12:00:00    2.0    7.0    7.0       3.0
2020-05-19 12:00:00   10.0   24.0   18.0       3.0
2020-05-20 12:00:00  974.0  971.0  927.0       3.0
2020-05-18 12:00:00   10.0   12.0   25.0       4.0
2020-05-19 12:00:00   25.0    7.0   20.0       4.0
2020-05-20 12:00:00  988.0  955.0  939.0       4.0
2020-05-18 12:00:00    1.0    8.0   11.0       5.0
2020-05-19 12:00:00   27.0   19.0   12.0       5.0
2020-05-20 12:00:00  902.0  904.0  916.0       5.0
2020-05-18 12:00:00   25.0   10.0   26.0       6.0
2020-05-19 12:00:00   29.0   11.0   27.0       6.0
2020-05-20 12:00:00  928.0  994.0  933.0       6.0

So the values for the 18th and the 20th are correct, but the values for the 19th are still from df.所以第 18 个和第 20 个的值是正确的,但第 19 个的值仍然来自 df。 I want the values from df to be overwritten with those in df1.我希望 df 中的值被 df1 中的值覆盖。 Please help!请帮忙!

you just need to use combine_first backwards.你只需要向后使用combine_first We can also use 'Location' as index instead groupby.cumcount我们也可以使用'Location'作为索引而不是groupby.cumcount

df3 = (df1.set_index('Location', append=True)
          .combine_first(df.set_index('Location', append=True))
          .reset_index(level='Location')
          .reindex(columns=df.columns)
          .sort_values('Location'))

print(df3)

                     Location  MeanT   MaxT   MinT
2020-05-18-12:00:00         2   28.0    0.0    9.0
2020-05-19-12:00:00         2  912.0  991.0  915.0
2020-05-20-12:00:00         2  936.0  917.0  965.0
2020-05-18-12:00:00         3    2.0    7.0    7.0
2020-05-19-12:00:00         3  918.0  977.0  901.0
2020-05-20-12:00:00         3  974.0  971.0  927.0
2020-05-18-12:00:00         4   10.0   12.0   25.0
2020-05-19-12:00:00         4  979.0  929.0  953.0
2020-05-20-12:00:00         4  988.0  955.0  939.0
2020-05-18-12:00:00         5    1.0    8.0   11.0
2020-05-19-12:00:00         5  969.0  983.0  940.0
2020-05-20-12:00:00         5  902.0  904.0  916.0
2020-05-18-12:00:00         6   25.0   10.0   26.0
2020-05-19-12:00:00         6  983.0  942.0  965.0
2020-05-20-12:00:00         6  928.0  994.0  933.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据两者的索引将一个数据帧中的值填充到另一个数据帧中 - Fill values from one dataframe into another dataframe based on index of the two 基于相同的日期时间将具有来自一个 dataframe 的值的列添加到另一个 - Adding column with values from one dataframe to another based on same datetime 根据重复的索引标签合并数据框值 - Combine dataframe values based on repeated index labels 根据条件从另一个数据帧的值替换一个数据帧的值 - substitue values of one dataframe from values of another dataframe based on condition 用基于索引的另一个数据帧中的行覆盖熊猫数据帧中的某些行 - Overwrite some rows in pandas dataframe with ones from another dataframe based on index 如何从另一个具有不同日期时间索引的 dataframe 获取列值 - How to get column values from another dataframe with a different datetime index 根据对另一个 dataframe 的条件检查,过滤来自一个 dataframe 的值 - Filter values from one dataframe based on conditional checks on another dataframe 从另一个 dataframe 基于年月日的日期时间索引 dataframe 行删除 - Drop from datetime index dataframe rows based in year month day from another dataframe 根据索引访问数据框中的值,而另一个访问索引中的值 - Access values in a dataframe based on index and values in another 总和 dataframe 基于来自另一个 dataframe 的值 - sum dataframe based on values from another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM