简体   繁体   中英

Extract the values, if the part of string in one column matches with part of string in another column

I have 2 data frames with 2 columns in each table. Now, I need to match the values of col A from df1 with the col C of df2 and get only the matched values.

For example:

df1

   A               B     
ABC PVT Ltd      1FWE23   
Auxil Solutions  22354    
Cambridge        32684    
Stacking Ltd     45368    
Ciscovt Ltd      46485    
Samsung Ltd      45346    
Nokia Ltd        58446    

df2

   C                     D     
BTD                    AAVV   
Auxil Company          ASDC  
Cambridge Univers      DECVD    
The Stacking Pvt       DVVCA  
Ciscovt brand          VDKMN
The Samsung Mobile     VDAVV    
The Nokia Mobile       VFAD    

I tried to convert the column C of df2 into list and compared with the column A of df1 . But I'm not sure how to extract the values if it even matches partially between the columns.

The code I tried:

dd= (df2['C'].str.upper()).unique().tolist()
df1['New'] = (df1['A'].str.upper()).apply(lambda x: ''.join([part for part in dd if part in x]))

The expected Output should be:

   A               B       New
ABC PVT Ltd      1FWE23   
Auxil Solutions  22354    Auxil Company
Cambridge        32684    Cambridge Univers
Stacking Ltd     45368    The Stacking Pvt
Ciscovt Ltd      46485    Ciscovt brand
Samsung Ltd      45346    The Samsung Mobile
Nokia Ltd        58446    The Nokia Mobile

One option, is to use get_close_matches from ( a Python's built-in ) with pandas.merge :

from difflib import get_close_matches
​
def match(word, l):
    m = get_close_matches(word, l, n=1, cutoff=0.4)
    if m:
        return m[0]
    return None
​​
out = (
        df1
          .assign(New= [match(x, list(df2["C"])) for x in df1["A"]])
           .merge(df2, left_on="New", right_on="C", how="left")
           .drop(["C", "D"], axis=1)
       )

NB: If needed, you can adjust the cutoff from 0 (total different string) to 1 (exact match).

Output:

print(out)
                 A       B                 New
0      ABC PVT Ltd  1FWE23                None
1  Auxil Solutions   22354       Auxil Company
2        Cambridge   32684   Cambridge Univers
3     Stacking Ltd   45368    The Stacking Pvt
4      Ciscovt Ltd   46485       Ciscovt brand
5      Samsung Ltd   45346  The Samsung Mobile
6        Nokia Ltd   58446    The Nokia Mobile

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM