简体   繁体   中英

Pandas: How to concatenate dataframes with different columns?

I tried to find the answer in the official Pandas documentation , but found it more confusing than helpful. Basically I have two dataframes with overlapping, but not identical column lists:

df1:
   A   B
0  22  34
1  78  42

df2:
   B   C
0  76  29
1  11  67

I want to merge/concatenate/append them so that the result is

df3:
   A   B   C
0  22  34  nan
1  78  42  nan
2  nan 76  29
3  nan 11  67

Should be fairly simple, but I've tried several intuitive approaches and always got errors. Can anybody help me?

You need merge with parameter how = outer

df3 = df1.merge(df2, how = 'outer')

    A       B   C
0   22.0    34  NaN
1   78.0    42  NaN
2   NaN     76  29.0
3   NaN     11  67.0

If you just want to concatenate the dataframes you can use.

pd.concat([df1,df2])

output:

      A   B     C
0  22.0  34   NaN
1  78.0  42   NaN
0   NaN  76  11.0
1   NaN  11  67.0

Then you can reset_index to recreate a simple incrementing index.

pd.concat([df,df2]).reset_index(drop = True)

Output:

      A   B     C
0  22.0  34   NaN
1  78.0  42   NaN
2   NaN  76  11.0
3   NaN  11  67.0

Both @vaishali and @scott-boston solution work. Prefer the merge function as it allow more flexibility on your result with the how parameter. Howerver concat can achieve better performance if few columns are involved

To optimize @scott-boston answer, you can also use the internal concat parameter igonore_index that automatically resize the index without calling another function the code would be like :

pd.concat([df1,df2],ignore_index=True)

Output

      A   B     C
0  22.0  34   NaN
1  78.0  42   NaN
2   NaN  76  11.0
3   NaN  11  67.0

Python (version 3.8.5) | pandas(version 1.1.3)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM