简体   繁体   中英

replace values in dataframe based in other dataframe filter

I have 2 DataFrames, and I want to replace the values in one dataframe, with the values of the other dataframe, base on the columns on the first one. I put the compositions to clarify.

DF1:

             A  B   C   D   E
Date
01/01/2019  1   2   3   4   5
02/01/2019  1   2   3   4   5
03/01/2019  1   2   3   4   5

DF2:

          name1 name2   name3
Date
01/01/2019  A       B       D
02/01/2019  B       C       E
03/01/2019  A       D       E

THE RESULT I WANT:

          name1 name2   name3   
Date
01/01/2019  1       2        4  
02/01/2019  2       3        5  
03/01/2019  1       4        5  

Try:

result = df2.melt(id_vars="index").merge(
    df1.melt(id_vars="index"),
    left_on=["index", "value"],
    right_on=["index", "variable"],
).drop(columns=["value_x", "variable_y"]).pivot(
    index="index", columns="variable_x", values="value_y"
)

print(result)

The two melt 's transform your dataframes to only contain the numbers in one column, and an additional column for the orignal column names:

df1.melt(id_vars='index')

         index variable  value
0   01/01/2019        A      1
1   02/01/2019        A      1
2   03/01/2019        A      1
3   01/01/2019        B      2
4   02/01/2019        B      2
5   03/01/2019        B      2
...

These you can now join on index and value / variable . The last part is just removing a couple of columns and then reshaping the table back to the desired form.

The result is

variable_x  name1  name2  name3
index                          
01/01/2019      1      2      4
02/01/2019      2      3      5
03/01/2019      1      4      5

Use DataFrame.lookup for each column separately:

for c in df2.columns:
    df2[c] = df1.lookup(df1.index, df2[c])
print (df2)
            name1  name2  name3
01/01/2019      1      2      4
02/01/2019      2      3      5
03/01/2019      1      4      5

General solution is possible different index and columns names:

print (df1)
            A  B  C  D  G
01/01/2019  1  2  3  4  5
02/01/2019  1  2  3  4  5
05/01/2019  1  2  3  4  5

print (df2)
           name1 name2 name3
01/01/2019     A     B     D
02/01/2019     B     C     E
08/01/2019     A     D     E

df1.index = pd.to_datetime(df1.index, dayfirst=True)
df2.index = pd.to_datetime(df2.index, dayfirst=True)

cols = df2.stack().unique()
idx = df2.index
df11 = df1.reindex(columns=cols, index=idx)
print (df11)
              A    B    D    C   E
2019-01-01  1.0  2.0  4.0  3.0 NaN
2019-01-02  1.0  2.0  4.0  3.0 NaN
2019-01-08  NaN  NaN  NaN  NaN NaN

for c in df2.columns:
    df2[c] = df11.lookup(df11.index, df2[c])
print (df2)
            name1  name2  name3
2019-01-01    1.0    2.0    4.0
2019-01-02    2.0    3.0    NaN
2019-01-08    NaN    NaN    NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM