Extract the values, if the part of string in one column matches with part of string in another column

Question

I have 2 data frames with 2 columns in each table. Now, I need to match the values of col A from df1 with the col C of df2 and get only the matched values.

For example:

df1

   A               B     
ABC PVT Ltd      1FWE23   
Auxil Solutions  22354    
Cambridge        32684    
Stacking Ltd     45368    
Ciscovt Ltd      46485    
Samsung Ltd      45346    
Nokia Ltd        58446

df2

   C                     D     
BTD                    AAVV   
Auxil Company          ASDC  
Cambridge Univers      DECVD    
The Stacking Pvt       DVVCA  
Ciscovt brand          VDKMN
The Samsung Mobile     VDAVV    
The Nokia Mobile       VFAD

I tried to convert the column C of df2 into list and compared with the column A of df1 . But I'm not sure how to extract the values if it even matches partially between the columns.

The code I tried:

dd= (df2['C'].str.upper()).unique().tolist()
df1['New'] = (df1['A'].str.upper()).apply(lambda x: ''.join([part for part in dd if part in x]))

The expected Output should be:

   A               B       New
ABC PVT Ltd      1FWE23   
Auxil Solutions  22354    Auxil Company
Cambridge        32684    Cambridge Univers
Stacking Ltd     45368    The Stacking Pvt
Ciscovt Ltd      46485    Ciscovt brand
Samsung Ltd      45346    The Samsung Mobile
Nokia Ltd        58446    The Nokia Mobile

Answer 1

One option, is to use get_close_matches from difflib ( a Python's built-in ) with pandas.merge :

from difflib import get_close_matches

def match(word, l):
    m = get_close_matches(word, l, n=1, cutoff=0.4)
    if m:
        return m[0]
    return None

out = (
        df1
          .assign(New= [match(x, list(df2["C"])) for x in df1["A"]])
           .merge(df2, left_on="New", right_on="C", how="left")
           .drop(["C", "D"], axis=1)
       )

NB: If needed, you can adjust the cutoff from 0 (total different string) to 1 (exact match).

Output:

print(out)
                 A       B                 New
0      ABC PVT Ltd  1FWE23                None
1  Auxil Solutions   22354       Auxil Company
2        Cambridge   32684   Cambridge Univers
3     Stacking Ltd   45368    The Stacking Pvt
4      Ciscovt Ltd   46485       Ciscovt brand
5      Samsung Ltd   45346  The Samsung Mobile
6        Nokia Ltd   58446    The Nokia Mobile

Extract the values, if the part of string in one column matches with part of string in another column

Question

1 answers

solution1
2 2023-02-01 14:20:55

Extract the values, if the part of string in one column matches with part of string in another column

Question

1 answers

solution1 2 2023-02-01 14:20:55

solution1
2 2023-02-01 14:20:55