简体   繁体   中英

Pandas merge two dataframes horizontally

I am dealing with a problem like this:

mydata_old
|  x  |  y   |
|-----|------|
| 1   | 10   |
| 2   | 15   |
| 3   | 12   |
| 4   | 21   |

mydata_new
|  x  |  y   |
|-----|------|
| 4   | 20   |
| 5   | 25   |
| 6   | 30   |

I would like to combine both tables to get:

mydata
|  x  |  y   |
|-----|------|
| 1   | 10   |
| 2   | 15   |
| 3   | 12   |
| 4   | 20   | # x = 4 does not show up twice
| 5   | 25   |
| 6   | 30   |

I tried with pandas.merge , but I cannot get the desired result.

mydata = pd.merge(mydata_new, mydata_old, how="left")

One way to do this is with an outer join (ie, keep the index from both dataframes). You can achieve this using pd.merge(mydata_new, mydata_old, how='outer') and then keep the newest data available.

However, this could be more straight forward with pandas.concat .

Here, I am concatenating both dataframes. Note that I exclude rows from mydata_old if x appears in mydata_new . This way you can keep the newest data when it's available in both dataframes (old and new).

# Concat frames and if data is available in both, keep data from mydata_new
mydata = pd.concat(
    [
        mydata_old.loc[~mydata_old['x'].isin(mydata_new['x'])],
        mydata_new
    ],
    axis=0)

This will print the following:

x y
1 10
2 15
3 12
4 20
5 25
6 30

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM