简体   繁体   English

检查一个Excel文件中的文本字符串是否包含在另一个Excel文件中

[英]Check if a text string in one excel file contains in another excel file

I've just started to learn Python , and i have two excel files which have different shape the first with 225 rows and the second with 500 . 我刚刚开始学习Python,我有两个具有不同形状的excel文件,第一个具有225行,第二个具有500行。 the task will be to compare a text from a specific column (Num 3) in file1 and compare it with the column (Num 3) also in file2, and if there's a match then show the highest percentage of the matching if there's no match show "No match" 任务是比较文件1中特定列(数字3)的文本,并将其与文件2中的列(数字3)进行比较,如果有匹配项,则在没有匹配项的情况下显示匹配百分比最高“没有比赛”

Can any one give me an advise about that? 有人可以给我建议吗?

Example

 import pandas as pd 
 import numpy as np
 from fuzzywuzzy import fuzz,process

 def match(x, y,  min_score=0):
    # -1 in case we don't get any match    
    max_score= -1
    max_text = ''
    for row2 in y:
    #finding fuzzy match score
    score = fuzz.ratio(x, row2)

    #checking if we are above our threshold and have a better score
    if (score > min_score) & (score > max_score):
        max_score = score
        max_text = row2

  return (max_score, max_text)

    #read the files
    pd.options.display.max_columns = 10

    #read only the 3rd column form both excel files
    wb1 = pd.read_excel('Excel1.xlsx', 'Sheet_name', na_values=['NA'], usecols =               [2])
    wb2 = pd.read_excel('Excel2.xlsx', 'Sheet_name', na_values=['NA'], usecols = [2])


    diff = pd.concat((wb1, wb2),  axis = 1)

    #add a new column to the DataFrame called "match"
    diff['match'] = np.zeros((len(diff), ))




  for i, row in enumerate(wb1['col_name']):
      score, text = match(row, wb2['col2_name'])
      print(score)
      diff.iloc[i, 1] = text
      diff.iloc[i, 2] = score


    diff.to_excel("output.xlsx")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 检查Excel文件是否包含公式 - Check if Excel file contains formula 在 excel 中查找字符串并在另一个文件中替换 output - Find string in excel and replace output in another file Python: How to copy Excel worksheet from multiple Excel files to one Excel file that contains all the worksheets from other Excel files - Python: How to copy Excel worksheet from multiple Excel files to one Excel file that contains all the worksheets from other Excel files 在文本文件中查找“字符串”-使用Python将其添加到Excel文件 - Find “string” in Text File - Add it to Excel File Using Python 筛选一个Excel文件并将结果输出到另一个Excel - Filter an Excel file and output the result into another Excel Python和Excel-检查文件是否打开 - Python and Excel - check if file is open 使用 python 将列从一个 excel 文件复制到另一个 excel 文件表 - Copy columns from one excel file to another excel file sheet using python 如何将数据从多个工作表的一个Excel文件复制到多个工作表的另一个Excel文件中 - How do I copy the data from one Excel file of multiple sheets to another Excel file of multiple sheets 如何使用 openpyxl 将一个 excel 文件的列值与 Python 中另一个 excel 文件的列值进行比较? - How to compare column values of one excel file to the column values of another excel file in Python using openpyxl? 使用 Python 有没有办法将数据从一个 excel 文件复制到另一个具有不同字段的 excel 文件 - Using Python Is there any way to copy data from one excel file to another excel file that has a different fields
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM