简体   繁体   中英

Pandas: update column values from second dataframe

I've got a dataframe df1 with date and other values like below:

date      value1     value2     value3
20100101  1          2          3
20100102  1          2          3
20100103  1          2          3
20100104  1          3          4
20100105  1          3          4
20100106  1          3          5
20100107  1          3          6

Then I'd like to update some values from another dataframe df2 :

date      value1      
20100102  2           
20100104  3        
20100105  4    
20100106  5       
20100107  6     

So the expected outcome will be:

date      value1     value2     value3
20100101  1          2          3
20100102  2          2          3
20100103  1          2          3
20100104  3          3          4
20100105  4          3          4
20100106  5          3          5
20100107  6          3          6  

As far as I know I can't do this with a left join, is there any fast and easy way to achieve this other than iterate through each date?


Update:

Thanks for all the answers!

I've got another case when df2 has different dates from df1 , eg

date      value1      
20100102  2           
20100104  3        
20100105  4    
20100106  5       
20100107  6   
20100108  7

Adding dropna(axis=0, how='any') to piRSquared's answer will solve this case.

Option 1

d2.set_index('date').combine_first(
    d1.set_index('date')).reset_index().astype(d1.dtypes)

       date  value1  value2  value3
0  20100101       1       2       3
1  20100102       2       2       3
2  20100103       1       2       3
3  20100104       3       3       4
4  20100105       4       3       4
5  20100106       5       3       5
6  20100107       6       3       6

Option 2

d1[['date']].merge(d2, 'left').combine_first(d1).astype(d1.dtypes)

       date  value1  value2  value3
0  20100101       1       2       3
1  20100102       2       2       3
2  20100103       1       2       3
3  20100104       3       3       4
4  20100105       4       3       4
5  20100106       5       3       5
6  20100107       6       3       6

I think this is faster:

In [58]: df.loc[df[df.date.isin(sd.date)].index,'value1'] = sd.value1.values.tolist()

In [59]: df
Out[59]: 
       date  value1  value2  value3
0  20100101       1       2       3
1  20100102       2       2       3
2  20100103       1       2       3
3  20100104       3       3       4
4  20100105       4       3       4
5  20100106       5       3       5
6  20100107       6       3       6
In [61]: %timeit df.loc[df[df.date.isin(sd.date)].index,'value1'] = sd.value1.values.tolist()
1000 loops, best of 3: 703 µs per loop

In [62]: %timeit sd.set_index('date').combine_first(df.set_index('date')).reset_index().astype(df.dtypes)
100 loops, best of 3: 4.08 ms per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM