简体   繁体   中英

Problems with combining columns from dataframes in pandas

I have two dataframes that I'm trying to merge.

               df1
    code  scale   R1    R2...
0   121     1     80    110
1   121     2     NaN   NaN
2   121     3     NaN   NaN
3   313     1     60    60
4   313     2     NaN   NaN
5   313     3     NaN   NaN
...
             df2
    code  scale   R1    R2...
0   121     2     30    20
3   313     2     15    10
...

I need, based on the equality of the columns code and scale copy the value from df2 to df1 .

The result should look like this:

               df1
    code  scale   R1    R2...
0   121     1     80    110
1   121     2     30    20
2   121     3     NaN   NaN
3   313     1     60    60
4   313     2     15    10
5   313     3     NaN   NaN
...

The problem is that there can be a lot of columns like R1 and R2 and I can not check each one separately, so I wanted to use something from this instruction , but nothing gives me the desired result. I'm doing something wrong, but I can't understand what. I really need advice.

What do you want to happen if the two dataframes both have values for R1/R2? If you want keep df1, you could do

df1.set_index(['code', 'scale']).fillna(df2.set_index(['code', 'scale'])).reset_index()

To keep df2 just do the fillna the other way round. To combine in some other way please clarify the question!

Try this ?

pd.concat([df,df1],axis=0).sort_values(['code','scale']).drop_duplicates(['code','scale'],keep='last')    
Out[21]: 
    code  scale    R1     R2
0   121      1  80.0  110.0
0   121      2  30.0   20.0
2   121      3   NaN    NaN
3   313      1  60.0   60.0
3   313      2  15.0   10.0
5   313      3   NaN    NaN

This is a good situation for combine_first . It replaces the nulls in the calling dataframe from the passed dataframe.

df1.set_index(['code', 'scale']).combine_first(df2.set_index(['code', 'scale'])).reset_index()

   code  scale    R1     R2
0   121      1  80.0  110.0
1   121      2  30.0   20.0
2   121      3   NaN    NaN
3   313      1  60.0   60.0
4   313      2  15.0   10.0
5   313      3   NaN    NaN

Other solutions

with fillna

df.set_index(['code', 'scale']).fillna(df1.set_index(['code', 'scale'])).reset_index()

with add - a bit faster

df.set_index(['code', 'scale']).add(df1.set_index(['code', 'scale']), fill_value=0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM