Pandas: find duplicates in another dataframe based on a subset

Question

Assume DF 1:

And DF 2

I would like to add a column to DF 1 with a count of duplicates in DF 2 based on a subset of columns:

For example

Duplicate on

1
2

Result:

   A  B  C  Dupe
0  1  1  1   1
1  1  1  2   1
2  2  1  1   1
3  1  9  0   2
4  9  9  9   0

Answer 1

Sound like you should groupby by df2 then merge

df=df1.merge(df2.groupby(['A','B']).size().to_frame('DUP').reset_index(),how='left').fillna(0)
   A  B  C  DUP
0  1  1  1  1.0
1  1  1  2  1.0
2  2  1  1  1.0
3  1  9  0  2.0
4  9  9  9  0.0

Pandas: find duplicates in another dataframe based on a subset

Question

1 answers

solution1
2 ACCPTED 2020-07-04 01:16:37

Pandas: find duplicates in another dataframe based on a subset

Question

1 answers

solution1 2 ACCPTED 2020-07-04 01:16:37

solution1
2 ACCPTED 2020-07-04 01:16:37