Pandas merge on first column

Question

I am trying to merge two pandas dataframes that have duplicate rows (here the rows consisting of 2's corresponding to 'a' and 'b') among the entries I am trying to merge. As a result, pandas is taking a cartesian product of the duplicate rows as shown below:

In [8]: df1 = pd.DataFrame({'a' : [1, 2, 2], 'b' : [2, 2, 2], 'c' : [3, 6, 6]}) 

In [9]: df2 = pd.DataFrame({'a' : [2, 2], 'b' : [2, 2], 'd' : [2, 5]})          

In [10]: df1.merge(df2, how='outer', on=['a', 'b'])                             
Out[10]: 
   a  b  c    d
0  1  2  3  NaN
1  2  2  6  2.0
2  2  2  6  5.0
3  2  2  6  2.0
4  2  2  6  5.0

The result I want is to only have the merge done once between each duplicate row, in the order that they appear (in this case numerically by the index). So the output that I would like to have is:

In [12]: df_output = pd.DataFrame({'a' : [1, 2, 2], 'b' : [2, 2, 2], 'c' : [3, 6
    ...: , 6], 'd' : [np.nan, 2, 5]})                                           

In [13]: df_output                                                              
Out[13]: 
   a  b  c    d
0  1  2  3  NaN
1  2  2  6  2.0
2  2  2  6  5.0

How would I do this?

Answer 1

You need helper column by counter created by GroupBy.cumcount :

df1 = pd.DataFrame({'a' : [1, 2, 2], 'b' : [2, 2, 2], 'c' : [3, 6, 6]}) 
df2 = pd.DataFrame({'a' : [2, 2], 'b' : [2, 2], 'd' : [2, 5]})    

df1['g'] = df1.groupby(['a', 'b']).cumcount()
df2['g'] = df2.groupby(['a', 'b']).cumcount()

df = df1.merge(df2, how='outer', on=['a', 'b', 'g'])  
print (df)
   a  b  c  g    d
0  1  2  3  0  NaN
1  2  2  6  0  2.0
2  2  2  6  1  5.0

Last remove g column:

df = df1.merge(df2, how='outer', on=['a', 'b', 'g']).drop('g', axis=1)  
print (df)
   a  b  c    d
0  1  2  3  NaN
1  2  2  6  2.0
2  2  2  6  5.0

Answer 2

Doesn't drop_duplicates solve your problem?

df = df1.merge(df2, how='outer', on=['a', 'b'])
df = df.drop_duplicates()

Answer 3

我认为就足够了

df1.merge(df2, how = 'outer').drop_duplicates()

Pandas merge on first column

Question

3 answers

solution1
0 2019-03-11 07:02:39

solution2
0 2019-03-11 08:55:27

solution3
0 2019-03-11 13:28:08

Pandas merge on first column

Question

3 answers

solution1 0 2019-03-11 07:02:39

solution2 0 2019-03-11 08:55:27

solution3 0 2019-03-11 13:28:08

solution1
0 2019-03-11 07:02:39

solution2
0 2019-03-11 08:55:27

solution3
0 2019-03-11 13:28:08