如果一列中的字符串部分與另一列中的部分字符串匹配，則提取值

Question

我有 2 個數據框，每個表中有 2 列。 現在，我需要將df1的 col A 的值與df2的 col C 進行匹配，並僅獲取匹配的值。

例如：

df1

   A               B     
ABC PVT Ltd      1FWE23   
Auxil Solutions  22354    
Cambridge        32684    
Stacking Ltd     45368    
Ciscovt Ltd      46485    
Samsung Ltd      45346    
Nokia Ltd        58446

df2

   C                     D     
BTD                    AAVV   
Auxil Company          ASDC  
Cambridge Univers      DECVD    
The Stacking Pvt       DVVCA  
Ciscovt brand          VDKMN
The Samsung Mobile     VDAVV    
The Nokia Mobile       VFAD

我試圖將df2的列 C 轉換為列表，並與df1的 A 列進行比較。 但我不確定如何提取值，即使它在列之間部分匹配。

我試過的代碼：

dd= (df2['C'].str.upper()).unique().tolist()
df1['New'] = (df1['A'].str.upper()).apply(lambda x: ''.join([part for part in dd if part in x]))

預期的 Output 應該是：

   A               B       New
ABC PVT Ltd      1FWE23   
Auxil Solutions  22354    Auxil Company
Cambridge        32684    Cambridge Univers
Stacking Ltd     45368    The Stacking Pvt
Ciscovt Ltd      46485    Ciscovt brand
Samsung Ltd      45346    The Samsung Mobile
Nokia Ltd        58446    The Nokia Mobile

Answer 1

一種選擇是將get_close_matches （ Python 的內置）中的get_close_matches與pandas.merge ：

from difflib import get_close_matches

def match(word, l):
    m = get_close_matches(word, l, n=1, cutoff=0.4)
    if m:
        return m[0]
    return None

out = (
        df1
          .assign(New= [match(x, list(df2["C"])) for x in df1["A"]])
           .merge(df2, left_on="New", right_on="C", how="left")
           .drop(["C", "D"], axis=1)
       )

注意：如果需要，您可以將截止值從 0（完全不同的字符串）調整為 1（完全匹配）。

Output：

print(out)
                 A       B                 New
0      ABC PVT Ltd  1FWE23                None
1  Auxil Solutions   22354       Auxil Company
2        Cambridge   32684   Cambridge Univers
3     Stacking Ltd   45368    The Stacking Pvt
4      Ciscovt Ltd   46485       Ciscovt brand
5      Samsung Ltd   45346  The Samsung Mobile
6        Nokia Ltd   58446    The Nokia Mobile

如果一列中的字符串部分與另一列中的部分字符串匹配，則提取值

問題描述

1 個解決方案

解決方案1
2 2023-02-01 14:20:55

如果一列中的字符串部分與另一列中的部分字符串匹配，則提取值

問題描述

1 個解決方案

解決方案1 2 2023-02-01 14:20:55

解決方案1
2 2023-02-01 14:20:55