简体   繁体   中英

Compare the rows of two columns in one dataframe if they exist another dataframe | python

I have a dataframe called combined which is have two columns c1,c2

combine:

 c1    c2
dr123 di878
dr987 di082
dr751 di715
dr156 di083

Another dataframe called specific have c1,c2,c3

specific:

 c1     c2    c3
dr987 di082 ekeodk
dr805 di827 sbdxdp
dr852 di737 pmzqde
dr751 di715 nedoas

I want to compare the values of c1,c2 come together in combined if they exist in specific , add a column in combined called label and put 1, if not put 0

So, the output dataframe will be this:

     c1    c2    label
    dr123 di878    0
    dr987 di082    1
    dr751 di715    1
    dr156 di083    0

I need an efficient way to do that because my combined dataframe have ~ 8 million rows, Any help please ?

Use

In [312]: df1['label'] = df1.merge(df2[['c1', 'c2']], how='left', indicator=True
                                   )['_merge'].eq('both').astype(int)

In [313]: df1
Out[313]:
      c1     c2  label
0  dr123  di878      0
1  dr987  di082      1
2  dr751  di715      1
3  dr156  di083      0

Alternatively, see if set hashing helps

In [88]: cols = ['c1', 'c2']

In [89]: mapper = {tuple(x[cols]) for _, x in df2.iterrows()}

In [90]: df1.apply(lambda x: tuple(x[['c1', 'c2']]) in mapper, axis=1).astype(int)
Out[90]:
0    0
1    1
2    1
3    0
dtype: int32

You can do:

x = pd.merge(combine, specific, how = 'left', on = ['c1', 'c2'], indicator = 'Exist')
x['Exist'] = x['Exist'].map({'left_only': 0, 'both': 1})
combine['label'] = x['Exist']

Out:

      c1     c2  label
0  dr123  di878      0
1  dr987  di082      1
2  dr751  di715      1
3  dr156  dr083      0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM