I'd like to merge values from one table into a new column, and then merge any missing values from another table into that same column:
arr1 = pd.DataFrame(['a'],['b'],['c'])
arr2 = pd.DataFrame(['a',1],['b',2])
arr3 = pd.DataFrame(['c',3])
output = [['a',1],['b',2],['c',3']]
Joining arr2 and arr3 and then merging is not an option because they have different number of columns in my actual application of this.
You can use pd.concat
to concatenate arr2
and arr3
. It will take care of any extra columns by filling missing data with nan
. Let's add an extra column to your data to show how this works:
arr2 = pd.DataFrame([['a',1,'extra column'],['b',2,'extra column']], index=None)
arr3 = pd.DataFrame([['c',3]], index=None)
arr2
:
0 | 1 | 2 | |
---|---|---|---|
0 | a | 1 | extra column |
1 | b | 2 | extra column |
arr3
:
0 | 1 | |
---|---|---|
0 | c | 3 |
Then concatenate:
new_df = pd.concat([arr2, arr3], ignore_index=True)
0 | 1 | 2 | |
---|---|---|---|
0 | a | 1 | extra column |
1 | b | 2 | extra column |
2 | c | 3 | nan |
Update: To tackle the dataframe not fitting in memory you could use dask
:
import dask.dataframe as dd
import pandas as pd
arr2 = pd.DataFrame([['a',1,'extra column'],['b',2,'extra column']], index=None)
arr3 = pd.DataFrame([['c',3]], index=None)
ddf1 = dd.from_pandas(arr2, 1)
ddf2 = dd.from_pandas(arr3, 1)
dd_final = dd.concat([ddf1, ddf2])
Output dd_final.compute()
:
0 | 1 | 2 | |
---|---|---|---|
0 | a | 1 | extra column |
1 | b | 2 | extra column |
0 | c | 3 | nan |
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.