简体   繁体   English

替换基于其他数据帧过滤器的数据帧中的值

[英]replace values in dataframe based in other dataframe filter

I have 2 DataFrames, and I want to replace the values in one dataframe, with the values of the other dataframe, base on the columns on the first one.我有 2 个数据帧,我想根据第一个数据帧的列,用另一个数据帧的值替换一个数据帧中的值。 I put the compositions to clarify.我把这些成分加以澄清。

DF1: DF1:

             A  B   C   D   E
Date
01/01/2019  1   2   3   4   5
02/01/2019  1   2   3   4   5
03/01/2019  1   2   3   4   5

DF2: DF2:

          name1 name2   name3
Date
01/01/2019  A       B       D
02/01/2019  B       C       E
03/01/2019  A       D       E

THE RESULT I WANT:我想要的结果:

          name1 name2   name3   
Date
01/01/2019  1       2        4  
02/01/2019  2       3        5  
03/01/2019  1       4        5  

Try:尝试:

result = df2.melt(id_vars="index").merge(
    df1.melt(id_vars="index"),
    left_on=["index", "value"],
    right_on=["index", "variable"],
).drop(columns=["value_x", "variable_y"]).pivot(
    index="index", columns="variable_x", values="value_y"
)

print(result)

The two melt 's transform your dataframes to only contain the numbers in one column, and an additional column for the orignal column names:两个melt将您的数据框转换为仅包含一列中的数字,以及原始列名称的附加列:

df1.melt(id_vars='index')

         index variable  value
0   01/01/2019        A      1
1   02/01/2019        A      1
2   03/01/2019        A      1
3   01/01/2019        B      2
4   02/01/2019        B      2
5   03/01/2019        B      2
...

These you can now join on index and value / variable .这些你现在可以加入indexvalue / variable The last part is just removing a couple of columns and then reshaping the table back to the desired form.最后一部分只是删除几列,然后将表格重新调整为所需的形式。

The result is结果是

variable_x  name1  name2  name3
index                          
01/01/2019      1      2      4
02/01/2019      2      3      5
03/01/2019      1      4      5

Use DataFrame.lookup for each column separately:对每一列分别使用DataFrame.lookup

for c in df2.columns:
    df2[c] = df1.lookup(df1.index, df2[c])
print (df2)
            name1  name2  name3
01/01/2019      1      2      4
02/01/2019      2      3      5
03/01/2019      1      4      5

General solution is possible different index and columns names:一般解决方案可能是不同的索引和列名称:

print (df1)
            A  B  C  D  G
01/01/2019  1  2  3  4  5
02/01/2019  1  2  3  4  5
05/01/2019  1  2  3  4  5

print (df2)
           name1 name2 name3
01/01/2019     A     B     D
02/01/2019     B     C     E
08/01/2019     A     D     E

df1.index = pd.to_datetime(df1.index, dayfirst=True)
df2.index = pd.to_datetime(df2.index, dayfirst=True)

cols = df2.stack().unique()
idx = df2.index
df11 = df1.reindex(columns=cols, index=idx)
print (df11)
              A    B    D    C   E
2019-01-01  1.0  2.0  4.0  3.0 NaN
2019-01-02  1.0  2.0  4.0  3.0 NaN
2019-01-08  NaN  NaN  NaN  NaN NaN

for c in df2.columns:
    df2[c] = df11.lookup(df11.index, df2[c])
print (df2)
            name1  name2  name3
2019-01-01    1.0    2.0    4.0
2019-01-02    2.0    3.0    NaN
2019-01-08    NaN    NaN    NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM