I have 2 data frames with 2 columns in each table. Now, I need to match the values of col A from df1 with the col C of df2 and get only the matched values.
For example:
df1
A B
ABC PVT Ltd 1FWE23
Auxil Solutions 22354
Cambridge 32684
Stacking Ltd 45368
Ciscovt Ltd 46485
Samsung Ltd 45346
Nokia Ltd 58446
df2
C D
BTD AAVV
Auxil Company ASDC
Cambridge Univers DECVD
The Stacking Pvt DVVCA
Ciscovt brand VDKMN
The Samsung Mobile VDAVV
The Nokia Mobile VFAD
I tried to convert the column C of df2 into list and compared with the column A of df1 . But I'm not sure how to extract the values if it even matches partially between the columns.
The code I tried:
dd= (df2['C'].str.upper()).unique().tolist()
df1['New'] = (df1['A'].str.upper()).apply(lambda x: ''.join([part for part in dd if part in x]))
The expected Output should be:
A B New
ABC PVT Ltd 1FWE23
Auxil Solutions 22354 Auxil Company
Cambridge 32684 Cambridge Univers
Stacking Ltd 45368 The Stacking Pvt
Ciscovt Ltd 46485 Ciscovt brand
Samsung Ltd 45346 The Samsung Mobile
Nokia Ltd 58446 The Nokia Mobile
One option, is to use get_close_matches
from difflib ( a Python's built-in ) with pandas.merge
:
from difflib import get_close_matches
def match(word, l):
m = get_close_matches(word, l, n=1, cutoff=0.4)
if m:
return m[0]
return None
out = (
df1
.assign(New= [match(x, list(df2["C"])) for x in df1["A"]])
.merge(df2, left_on="New", right_on="C", how="left")
.drop(["C", "D"], axis=1)
)
NB: If needed, you can adjust the cutoff from 0 (total different string) to 1 (exact match).
Output:
print(out)
A B New
0 ABC PVT Ltd 1FWE23 None
1 Auxil Solutions 22354 Auxil Company
2 Cambridge 32684 Cambridge Univers
3 Stacking Ltd 45368 The Stacking Pvt
4 Ciscovt Ltd 46485 Ciscovt brand
5 Samsung Ltd 45346 The Samsung Mobile
6 Nokia Ltd 58446 The Nokia Mobile
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.