[英]Iterating over 2 columns and comparing similarities in Python
我有一個看起來像這樣的 DF:
Row Account_Name_HGI company_name_Ignite
1 00150042 plc WAGON PLC
2 01 telecom, ltd. 01 TELECOM LTD
3 0404 investments limited 0404 Investments Ltd
我想要做的是遍歷Account_Name_HGI
和company_name_Ignite
列並比較第 1 行中的 2 個字符串並為我提供相似度分數。 我有提供分數的代碼:
from difflib import SequenceMatcher
def similar(a, b):
return SequenceMatcher(None, a, b).ratio()
這帶來了我想要的相似度分數,但我對如何創建將迭代 2 列並返回相似度分數的 for 循環的邏輯有疑問。 任何幫助將不勝感激。
使用列表理解zip
:
from difflib import SequenceMatcher
df['ratio'] = [similar(a, b) for a, b in
zip(df['Account_Name_HGI'], df['company_name_Ignite'])]
# or directly without your custom function
df['ratio'] = [SequenceMatcher(None, a, b).ratio() for a,b in
zip(df['Account_Name_HGI'], df['company_name_Ignite'])
]
Output:
Row Account_Name_HGI company_name_Ignite ratio
0 1 00150042 plc WAGON PLC 0.095238
1 2 01 telecom, ltd. 01 TELECOM LTD 0.266667
2 3 0404 investments limited 0404 Investments Ltd 0.818182
使用列表推導式壓縮兩列:
from difflib import SequenceMatcher
df['ratio'] = [SequenceMatcher(None, a, b).ratio()
for a, b
in zip(df['Account_Name_HGI'], df['company_name_Ignite'])]
print (df)
Row Account_Name_HGI company_name_Ignite ratio
0 1 00150042 plc WAGON PLC 0.095238
1 2 01 telecom, ltd. 01 TELECOM LTD 0.266667
2 3 0404 investments limited 0404 Investments Ltd 0.818182
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.