[英]Partial word match between two columns of different pandas dataframes
I have two data-frames like:我有两个数据框,例如:
df1: df1:
df2: df2:
I am trying make a match of any term to text.我正在尝试将任何术语与文本进行匹配。
MyCode:我的代码:
import sys,os
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import csv
import re
# data
data1 = {'termID': [1,55,341,41,5685], 'term':['Cardic Arrest','Headache','Chest Pain','Muscle Pain', 'Knee Pain']}
data2 = {'textID': [25,12,52,35], 'text':['Hello Mike, Good Morning!!',
'Oops!! My Knee pains!!',
'Stop Music!! my head pains',
'Arrest Innocent!!'
]}
#Dataframes
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
# Matching logic
matchList=[]
for index_b, row_b in df2.iterrows():
for index_a, row_a in df1.iterrows():
if row_a.term.lower() in row_b.text.lower() :
#print(row_b.text, row_a.term)
matchList.append([row_b.textID,row_b.text ,row_a.term, row_a.termID] )
cols = ['textID', 'text,','term ','termID' ]
d = pd.DataFrame(matchList, columns = cols)
print(d)
Which gave me only single row as output:这给了我只有单行 output:
I have two issues to fix:我有两个问题要解决:
What optimum ways are there to fix these two issues?解决这两个问题的最佳方法是什么?
I've a quick fix for problem 1 but not an optimisation.我对问题 1 有一个快速修复,但不是优化。 You only get one match because "Knee pain" is the only term that appears in full in df1.
您只会得到一场比赛,因为“膝盖疼痛”是 df1 中唯一完整出现的术语。 I've modified the if statement to split the text from df2 and check if there are any matches from the list.
我已经修改了 if 语句以从 df2 中拆分文本并检查列表中是否有任何匹配项。 Agree with @jakub that there are libraries that will do this quicker.
同意@jakub 的观点,有些库可以更快地做到这一点。
# Matching logic
matchList=[]
for index_b, row_b in df2.iterrows():
print(row_b)
for index_a, row_a in df1.iterrows():
if any(word in row_a.term.lower() for word in row_b.text.lower().split()):
#print(row_b.text, row_a.term)
matchList.append([row_b.textID,row_b.text ,row_a.term, row_a.termID] )
cols = ['textID', 'text,','term ','termID' ]
d = pd.DataFrame(matchList, columns = cols)
print(d)
Output Output
textID text, term termID
0 12 Oops!! My Knee pains!! Knee Pain 5685
1 52 Stop Music!! my head pains Headache 55
2 35 Arrest Innocent!! Cardic Arrest 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.