[英]Find the string matching between two data frames
我有一個DataFrame如下。
DF1:
A
Any Match
Credit
I need a debit card.
Logging
Awesome
我有另一個DataFrame,如下所示:
DF2:
B
I did not find any match.
I want a credit card.
I need a debit card.
I do not know.
I am logging into credit portal.
我需要我的輸出為:
B A
I did not find any match. Any Match
I want a credit card. Credit
I need a debit card. I need a debit card.
I am logging into credit portal. logging,credit
在此,如果DF1中存在的短語在DF2中存在的任何文本中。 將o / p打印為文本和重要短語。
嘗試Fuzzywuzzy
:
import pandas as pd
from fuzzywuzzy import fuzz
matched_entities = []
for row in df1.index:
name1 = vendor_df.get_value(row,"A")
for columns in df2.index:
name2=df2.get_value(columns,"B")
matched_token=fuzz.partial_ratio(name1,name2)
if matched_token> 80:
matched_vendors.append([A,B])
df_partial_ratio = pd.DataFrame(columns=['A', 'B'], data=matched_entities)
在數據庫中,如果fuzz.partial_ratio
不起作用,請嘗試fuzz.ratio
或fuzz.token_sort_ratio
。 可以通過將以下代碼中的一行更改為以下代碼來實現這兩項:
matched_token=fuzz.ratio(name1,name2)
要么
matched_token=fuzz.token_sort_ratio(name1,name2)
你可以做這樣的事情。 首先,定義一個與“規范化”文本匹配的查找函數,例如小寫:
def lookup(x, values):
for value in values:
if value.lower() in x.lower():
return value
然后將此功能應用於DF2:
dfB['A'] = dfB['B'].apply(lambda x: lookup(x, dfA['A']))
哪個應該給你:
B A
0 I did not find any match. Any Match
1 I want a credit card. Credit
2 I need a debit card. Debit
3 I do not know. None
嘗試這個
df1['B'] = float('nan')
pos = 0
for i in range(len(df1)):
for j in range(len(df2)):
if df1['A'][i].lower() in df2['B'][j].lower():
df1['B'].iloc[pos] = df2['B'][j]
pos+=1
break
df1.dropna(axis=0)
輸出
A B
0 Any Match I did not find any match.
1 Credit I want a credit card.
2 I need a debit card I need a debit card.
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.