Union of two pandas DataFrames

Question

Say I have two data frames:

df1:

  A
0 a
1 b

df2:

  A
0 a
1 c

I want the result to be the union of the two frames with an extra column showing the source data frame that the row belongs to. In case of duplicates, duplicates should be removed and the respective extra column should show both sources:

  A  B
0 a  df1, df2
1 b  df1
2 c  df2

I can get the concatenated data frame (df3) without duplicates as follows:

import pandas as pd
df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True)

I can't think of/find a method to have control over what element goes where. How can I add the extra column?

Thank you very much for any tips.

Answer 1

Merge with an indicator argument, and remap the result:

m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}

result = df1.merge(df2, on=['A'], how='outer', indicator='B')
result['B'] = result['B'].map(m)

result
   A         B
0  a  df1, df2
1  b       df1
2  c       df2

Answer 2

Use the command below:

df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')]) \
    .groupby('A') \
    .aggregate(list) \
    .reset_index()

The result will be:

   A      source
0  a  [df1, df2]
1  b       [df1]
2  c       [df2]

The assign will add a column named source with value df1 and df2 to your dataframes. groupby command groups rows with same A value to single row. aggregate command describes how to aggregate other columns ( source ) for each group of rows with same A . I have used list aggregate function so that the source column be the list of values with same A .

Answer 3

We use outer join to solve this -

df1 = pd.DataFrame({'A':['a','b']})
df2 = pd.DataFrame({'A':['a','c']})
df1['col1']='df1'
df2['col2']='df2'
df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')
df['B']=df['col1']+','+df['col2']
df['B'] = df['B'].str.strip(',')
df=df[['A','B']]
df

   A        B
0  a  df1,df2
1  b      df1
2  c      df2

Union of two pandas DataFrames

Question

3 answers

solution1
12 ACCPTED 2019-01-22 19:37:39

solution2
2 2019-01-22 19:46:03

solution3
2 2019-01-22 19:52:33

Union of two pandas DataFrames

Question

3 answers

solution1 12 ACCPTED 2019-01-22 19:37:39

solution2 2 2019-01-22 19:46:03

solution3 2 2019-01-22 19:52:33

solution1
12 ACCPTED 2019-01-22 19:37:39

solution2
2 2019-01-22 19:46:03

solution3
2 2019-01-22 19:52:33