![](/img/trans.png)
[英]Fill values from one dataframe into another dataframe based on index of the two
[英]Overwrite one dataframe with values from another dataframe, based on repeated datetime index
我想根據日期時間索引更新和覆蓋一個數據幀的值,並使用另一個數據幀中的值來重復日期時間索引。 這段代碼說明了我的問題,為了說明目的,我給出了 df1 瘋狂值:
#import packages
import pandas as pd
import numpy as np
#create dataframes and indices
df = pd.DataFrame(np.random.randint(0,30,size=(10, 3)), columns=(['MeanT', 'MaxT', 'MinT']))
df1 = pd.DataFrame(np.random.randint(900,1000,size=(10, 3)), columns=(['MeanT', 'MaxT', 'MinT']))
df['Location'] =[2,2,3,3,4,4,5,5,6,6]
df1['Location'] =[2,2,3,3,4,4,5,5,6,6]
df.index = ["2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00","2020-05-18 12:00:00","2020-05-19 12:00:00"]
df1.index = ["2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00", "2020-05-19 12:00:00", "2020-05-20 12:00:00"]
df.index = pd.to_datetime(df.index)
df1.index = pd.to_datetime(df1.index)
看看這兩個數據框,它顯示了 df 的日期 18 日和 19 日,以及 df1 的 19 日和 20 日。
print(df)
MeanT MaxT MinT Location
2020-05-18 12:00:00 28 0 9 2
2020-05-19 12:00:00 22 7 11 2
2020-05-18 12:00:00 2 7 7 3
2020-05-19 12:00:00 10 24 18 3
2020-05-18 12:00:00 10 12 25 4
2020-05-19 12:00:00 25 7 20 4
2020-05-18 12:00:00 1 8 11 5
2020-05-19 12:00:00 27 19 12 5
2020-05-18 12:00:00 25 10 26 6
2020-05-19 12:00:00 29 11 27 6
print(df1)
MeanT MaxT MinT Location
2020-05-19 12:00:00 912 991 915 2
2020-05-20 12:00:00 936 917 965 2
2020-05-19 12:00:00 918 977 901 3
2020-05-20 12:00:00 974 971 927 3
2020-05-19 12:00:00 979 929 953 4
2020-05-20 12:00:00 988 955 939 4
2020-05-19 12:00:00 969 983 940 5
2020-05-20 12:00:00 902 904 916 5
2020-05-19 12:00:00 983 942 965 6
2020-05-20 12:00:00 928 994 933 6
我想創建一個新的數據框,它使用來自 df1 的值更新 df,因此新的 df 具有來自 df 的第 18 個值以及來自 df1 的第 19 個和第 20 個值。
我試過像這樣使用 combine_first :
df = df.set_index(df.groupby(level=0).cumcount(), append=True)
df1 = df1.set_index(df1.groupby(level=0).cumcount(), append=True)
df3 = df.combine_first(df1).sort_index(level=[1,0]).reset_index(level=1, drop=True)
它更新數據幀,但不會用 df1 中的值覆蓋第 19 個的數據。 它產生這個輸出:
print(df3)
MeanT MaxT MinT Location
2020-05-18 12:00:00 28.0 0.0 9.0 2.0
2020-05-19 12:00:00 22.0 7.0 11.0 2.0
2020-05-20 12:00:00 936.0 917.0 965.0 2.0
2020-05-18 12:00:00 2.0 7.0 7.0 3.0
2020-05-19 12:00:00 10.0 24.0 18.0 3.0
2020-05-20 12:00:00 974.0 971.0 927.0 3.0
2020-05-18 12:00:00 10.0 12.0 25.0 4.0
2020-05-19 12:00:00 25.0 7.0 20.0 4.0
2020-05-20 12:00:00 988.0 955.0 939.0 4.0
2020-05-18 12:00:00 1.0 8.0 11.0 5.0
2020-05-19 12:00:00 27.0 19.0 12.0 5.0
2020-05-20 12:00:00 902.0 904.0 916.0 5.0
2020-05-18 12:00:00 25.0 10.0 26.0 6.0
2020-05-19 12:00:00 29.0 11.0 27.0 6.0
2020-05-20 12:00:00 928.0 994.0 933.0 6.0
所以第 18 個和第 20 個的值是正確的,但第 19 個的值仍然來自 df。 我希望 df 中的值被 df1 中的值覆蓋。 請幫忙!
你只需要向后使用combine_first
。 我們也可以使用'Location'
作為索引而不是groupby.cumcount
df3 = (df1.set_index('Location', append=True)
.combine_first(df.set_index('Location', append=True))
.reset_index(level='Location')
.reindex(columns=df.columns)
.sort_values('Location'))
print(df3)
Location MeanT MaxT MinT
2020-05-18-12:00:00 2 28.0 0.0 9.0
2020-05-19-12:00:00 2 912.0 991.0 915.0
2020-05-20-12:00:00 2 936.0 917.0 965.0
2020-05-18-12:00:00 3 2.0 7.0 7.0
2020-05-19-12:00:00 3 918.0 977.0 901.0
2020-05-20-12:00:00 3 974.0 971.0 927.0
2020-05-18-12:00:00 4 10.0 12.0 25.0
2020-05-19-12:00:00 4 979.0 929.0 953.0
2020-05-20-12:00:00 4 988.0 955.0 939.0
2020-05-18-12:00:00 5 1.0 8.0 11.0
2020-05-19-12:00:00 5 969.0 983.0 940.0
2020-05-20-12:00:00 5 902.0 904.0 916.0
2020-05-18-12:00:00 6 25.0 10.0 26.0
2020-05-19-12:00:00 6 983.0 942.0 965.0
2020-05-20-12:00:00 6 928.0 994.0 933.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.