检查一个Excel文件中的文本字符串是否包含在另一个Excel文件中

Question

I've just started to learn Python , and i have two excel files which have different shape the first with 225 rows and the second with 500 . 我刚刚开始学习Python，我有两个具有不同形状的excel文件，第一个具有225行，第二个具有500行。 the task will be to compare a text from a specific column (Num 3) in file1 and compare it with the column (Num 3) also in file2, and if there's a match then show the highest percentage of the matching if there's no match show "No match" 任务是比较文件1中特定列（数字3）的文本，并将其与文件2中的列（数字3）进行比较，如果有匹配项，则在没有匹配项的情况下显示匹配百分比最高“没有比赛”

Can any one give me an advise about that? 有人可以给我建议吗？

Example 例

Answer 1

 import pandas as pd 
 import numpy as np
 from fuzzywuzzy import fuzz,process

 def match(x, y,  min_score=0):
    # -1 in case we don't get any match    
    max_score= -1
    max_text = ''
    for row2 in y:
    #finding fuzzy match score
    score = fuzz.ratio(x, row2)

    #checking if we are above our threshold and have a better score
    if (score > min_score) & (score > max_score):
        max_score = score
        max_text = row2

  return (max_score, max_text)

    #read the files
    pd.options.display.max_columns = 10

    #read only the 3rd column form both excel files
    wb1 = pd.read_excel('Excel1.xlsx', 'Sheet_name', na_values=['NA'], usecols =               [2])
    wb2 = pd.read_excel('Excel2.xlsx', 'Sheet_name', na_values=['NA'], usecols = [2])


    diff = pd.concat((wb1, wb2),  axis = 1)

    #add a new column to the DataFrame called "match"
    diff['match'] = np.zeros((len(diff), ))




  for i, row in enumerate(wb1['col_name']):
      score, text = match(row, wb2['col2_name'])
      print(score)
      diff.iloc[i, 1] = text
      diff.iloc[i, 2] = score


    diff.to_excel("output.xlsx")

检查一个Excel文件中的文本字符串是否包含在另一个Excel文件中

问题描述

1 个解决方案

解决方案1
0 2019-07-29 13:45:26

检查一个Excel文件中的文本字符串是否包含在另一个Excel文件中

问题描述

1 个解决方案

解决方案1 0 2019-07-29 13:45:26

解决方案1
0 2019-07-29 13:45:26