简体   繁体   中英

Pandas merge values from two tables into on column

I'd like to merge values from one table into a new column, and then merge any missing values from another table into that same column:

arr1 = pd.DataFrame(['a'],['b'],['c'])
arr2 = pd.DataFrame(['a',1],['b',2])
arr3 = pd.DataFrame(['c',3])


output = [['a',1],['b',2],['c',3']]

Joining arr2 and arr3 and then merging is not an option because they have different number of columns in my actual application of this.

You can use pd.concat to concatenate arr2 and arr3 . It will take care of any extra columns by filling missing data with nan . Let's add an extra column to your data to show how this works:

arr2 = pd.DataFrame([['a',1,'extra column'],['b',2,'extra column']], index=None)
arr3 = pd.DataFrame([['c',3]], index=None)

arr2 :

0 1 2
0 a 1 extra column
1 b 2 extra column

arr3 :

0 1
0 c 3

Then concatenate:

new_df = pd.concat([arr2, arr3], ignore_index=True)
0 1 2
0 a 1 extra column
1 b 2 extra column
2 c 3 nan

Update: To tackle the dataframe not fitting in memory you could use dask :

import dask.dataframe as dd
import pandas as pd

arr2 = pd.DataFrame([['a',1,'extra column'],['b',2,'extra column']], index=None)
arr3 = pd.DataFrame([['c',3]], index=None)

ddf1 = dd.from_pandas(arr2, 1)
ddf2 = dd.from_pandas(arr3, 1)

dd_final = dd.concat([ddf1, ddf2])

Output dd_final.compute() :

0 1 2
0 a 1 extra column
1 b 2 extra column
0 c 3 nan

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM