[英]Check if a text string in one excel file contains in another excel file
I've just started to learn Python , and i have two excel files which have different shape the first with 225 rows and the second with 500 . 我刚刚开始学习Python,我有两个具有不同形状的excel文件,第一个具有225行,第二个具有500行。 the task will be to compare a text from a specific column (Num 3) in file1 and compare it with the column (Num 3) also in file2, and if there's a match then show the highest percentage of the matching if there's no match show "No match"
任务是比较文件1中特定列(数字3)的文本,并将其与文件2中的列(数字3)进行比较,如果有匹配项,则在没有匹配项的情况下显示匹配百分比最高“没有比赛”
Can any one give me an advise about that? 有人可以给我建议吗?
import pandas as pd
import numpy as np
from fuzzywuzzy import fuzz,process
def match(x, y, min_score=0):
# -1 in case we don't get any match
max_score= -1
max_text = ''
for row2 in y:
#finding fuzzy match score
score = fuzz.ratio(x, row2)
#checking if we are above our threshold and have a better score
if (score > min_score) & (score > max_score):
max_score = score
max_text = row2
return (max_score, max_text)
#read the files
pd.options.display.max_columns = 10
#read only the 3rd column form both excel files
wb1 = pd.read_excel('Excel1.xlsx', 'Sheet_name', na_values=['NA'], usecols = [2])
wb2 = pd.read_excel('Excel2.xlsx', 'Sheet_name', na_values=['NA'], usecols = [2])
diff = pd.concat((wb1, wb2), axis = 1)
#add a new column to the DataFrame called "match"
diff['match'] = np.zeros((len(diff), ))
for i, row in enumerate(wb1['col_name']):
score, text = match(row, wb2['col2_name'])
print(score)
diff.iloc[i, 1] = text
diff.iloc[i, 2] = score
diff.to_excel("output.xlsx")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.