简体   繁体   中英

Union of two pandas DataFrames

Say I have two data frames:

df1:

  A
0 a
1 b

df2:

  A
0 a
1 c

I want the result to be the union of the two frames with an extra column showing the source data frame that the row belongs to. In case of duplicates, duplicates should be removed and the respective extra column should show both sources:

  A  B
0 a  df1, df2
1 b  df1
2 c  df2

I can get the concatenated data frame (df3) without duplicates as follows:

import pandas as pd
df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True)

I can't think of/find a method to have control over what element goes where. How can I add the extra column?

Thank you very much for any tips.

Merge with an indicator argument, and remap the result:

m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}

result = df1.merge(df2, on=['A'], how='outer', indicator='B')
result['B'] = result['B'].map(m)

result
   A         B
0  a  df1, df2
1  b       df1
2  c       df2

Use the command below:

df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')]) \
    .groupby('A') \
    .aggregate(list) \
    .reset_index()

The result will be:

   A      source
0  a  [df1, df2]
1  b       [df1]
2  c       [df2]

The assign will add a column named source with value df1 and df2 to your dataframes. groupby command groups rows with same A value to single row. aggregate command describes how to aggregate other columns ( source ) for each group of rows with same A . I have used list aggregate function so that the source column be the list of values with same A .

We use outer join to solve this -

df1 = pd.DataFrame({'A':['a','b']})
df2 = pd.DataFrame({'A':['a','c']})
df1['col1']='df1'
df2['col2']='df2'
df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')
df['B']=df['col1']+','+df['col2']
df['B'] = df['B'].str.strip(',')
df=df[['A','B']]
df

   A        B
0  a  df1,df2
1  b      df1
2  c      df2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM