简体   繁体   English

如何使用python检查两个不同excel文件中两个列表之间的相似性?

[英]How to check the similarity between two lists in two different excel files using python?

I have two lists containing customer names. 我有两个包含客户名称的列表。 The names can be similar or different. 名称可以相似或不同。 How to find the similarity between these two lists using python? 如何使用python查找这两个列表之间的相似性?

After having similarity I want to pull corresponding data from one excel file to other. 具有相似性之后,我想将一个excel文件中的相应数据拉到另一个文件中。

example: 例:

List 1: 清单1:

Customer Name       Unique ID
IBM                 2365
BOA                 5456
BMW AG              2456

List 2: 清单2:

Customer Name     Unique ID
IBM Pvt Ltd        
BMW Group
Robert Bosch
BOA Ltd

This is just a sample data. 这只是一个示例数据。 Actual data contains almost 300k lines. 实际数据包含近30万行。

I tried Jaccard Similarity by passing the two lists separately as an excel files to the function, but the result (ie Jaccard Similarity) is always zero. 我通过将两个列表作为excel文件分别传递给函数来尝试了Jaccard相似性,但是结果(即Jaccard相似性)始终为零。

Edit: How to iterate through both the lists, compare each element with all the elements of other list and build a distance matrix? 编辑:如何遍历两个列表,将每个元素与其他列表的所有元素进行比较,并建立距离矩阵?

Then, I would like to sort each row of that matrix in descending order to know the closest match between them. 然后,我想按降序对矩阵的每一行进行排序,以了解它们之间最接近的匹配项。 Or is there any other better method to know the closest match after the matrix is built? 还是在建立矩阵后还有其他更好的方法来知道最接近的匹配项?

Could you elaborate and make your question a little clear ? 您能详细说明一下您的问题吗?

What doe you mean by Similarity beetwen 2 list ? “相似性beetwen 2列表”是什么意思?

When you say List, you mean CSV/Excel List or Python list . 说清单时,是指CSV / Excel清单或Python清单。 If you are looking at distance beetwen the string you might have to look at Levenshtein Algorithm . 如果您正在查看距离beetwen字符串,则可能必须查看Levenshtein算法。 https://www.geeksforgeeks.org/edit-distance-dp-5/ https://www.geeksforgeeks.org/edit-distance-dp-5/

Pythonic - https://www.python-course.eu/levenshtein_distance.php . Pythonic- https: //www.python-course.eu/levenshtein_distance.php

Since your data size if humongous , Alsp Check external merge sort strategy 由于您的数据量很大,因此Alsp Check外部合并排序策略

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我们如何检查 python 中两个音频文件的 hash 值之间的相似性? - how do we check similarity between hash values of two audio files in python? 使用带有NLTK的NLTK检查两个单词之间的相似性 - Check the similarity between two words with NLTK with Python 两个列表之间的距离相似度 - Distance similarity between two lists 计算两个列表之间的相似性 - Compute the similarity between two lists 如何使用 python 计算两个不同长度的连续随机样本之间的距离(相似度)? - How to calculate distance (similarity) between two continuous random samples with different length using python? 如何使用 python 查找两个字符串与 function 之间的相似性 - How to find similarity between two strings with function using python 使用python返回excel中两个不同文件中两列之间的差异 - Returning differences between two columns in two different files in excel using python 如何构建一个 function,它使用 python 从两个不同的列表中找到两个值之间的所有整数? - How to build a function that finds all integers between two values from two different lists using python? 如何计算包含列表的两个系列之间的相似性度量? - How to compute a similarity metric between two Series containing lists? 有没有办法检查 python 中两个完整句子之间的相似性? - is there a way to check similarity between two full sentences in python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM