I have a dataframe called combined
which is have two columns c1,c2
combine:
c1 c2
dr123 di878
dr987 di082
dr751 di715
dr156 di083
Another dataframe called specific
have c1,c2,c3
specific:
c1 c2 c3
dr987 di082 ekeodk
dr805 di827 sbdxdp
dr852 di737 pmzqde
dr751 di715 nedoas
I want to compare the values of c1,c2
come together in combined
if they exist in specific
, add a column in combined
called label
and put 1, if not put 0
So, the output dataframe will be this:
c1 c2 label
dr123 di878 0
dr987 di082 1
dr751 di715 1
dr156 di083 0
I need an efficient way to do that because my combined dataframe have ~ 8 million rows, Any help please ?
Use
In [312]: df1['label'] = df1.merge(df2[['c1', 'c2']], how='left', indicator=True
)['_merge'].eq('both').astype(int)
In [313]: df1
Out[313]:
c1 c2 label
0 dr123 di878 0
1 dr987 di082 1
2 dr751 di715 1
3 dr156 di083 0
Alternatively, see if set hashing helps
In [88]: cols = ['c1', 'c2']
In [89]: mapper = {tuple(x[cols]) for _, x in df2.iterrows()}
In [90]: df1.apply(lambda x: tuple(x[['c1', 'c2']]) in mapper, axis=1).astype(int)
Out[90]:
0 0
1 1
2 1
3 0
dtype: int32
You can do:
x = pd.merge(combine, specific, how = 'left', on = ['c1', 'c2'], indicator = 'Exist')
x['Exist'] = x['Exist'].map({'left_only': 0, 'both': 1})
combine['label'] = x['Exist']
Out:
c1 c2 label
0 dr123 di878 0
1 dr987 di082 1
2 dr751 di715 1
3 dr156 dr083 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.