简体   繁体   中英

Pandas: find duplicates in another dataframe based on a subset

Assume DF 1:

   A  B  C
0  1  1  1
1  1  1  2
2  2  1  1
3  1  9  0
4  9  9  9

And DF 2

   A  B  C
0  6  1  1
1  1  1  2
2  2  1  1
3  1  9  0
4  1  9  6

I would like to add a column to DF 1 with a count of duplicates in DF 2 based on a subset of columns:

For example

Duplicate on

  • 1
  • 2

Result:

   A  B  C  Dupe
0  1  1  1   1
1  1  1  2   1
2  2  1  1   1
3  1  9  0   2
4  9  9  9   0

Sound like you should groupby by df2 then merge

df=df1.merge(df2.groupby(['A','B']).size().to_frame('DUP').reset_index(),how='left').fillna(0)
   A  B  C  DUP
0  1  1  1  1.0
1  1  1  2  1.0
2  2  1  1  1.0
3  1  9  0  2.0
4  9  9  9  0.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM