[英]How to check the similarity between two lists in two different excel files using python?
I have two lists containing customer names. 我有两个包含客户名称的列表。 The names can be similar or different. 名称可以相似或不同。 How to find the similarity between these two lists using python? 如何使用python查找这两个列表之间的相似性?
After having similarity I want to pull corresponding data from one excel file to other. 具有相似性之后,我想将一个excel文件中的相应数据拉到另一个文件中。
example: 例:
List 1: 清单1:
Customer Name Unique ID
IBM 2365
BOA 5456
BMW AG 2456
List 2: 清单2:
Customer Name Unique ID
IBM Pvt Ltd
BMW Group
Robert Bosch
BOA Ltd
This is just a sample data. 这只是一个示例数据。 Actual data contains almost 300k lines. 实际数据包含近30万行。
I tried Jaccard Similarity by passing the two lists separately as an excel files to the function, but the result (ie Jaccard Similarity) is always zero. 我通过将两个列表作为excel文件分别传递给函数来尝试了Jaccard相似性,但是结果(即Jaccard相似性)始终为零。
Edit: How to iterate through both the lists, compare each element with all the elements of other list and build a distance matrix? 编辑:如何遍历两个列表,将每个元素与其他列表的所有元素进行比较,并建立距离矩阵?
Then, I would like to sort each row of that matrix in descending order to know the closest match between them. 然后,我想按降序对矩阵的每一行进行排序,以了解它们之间最接近的匹配项。 Or is there any other better method to know the closest match after the matrix is built? 还是在建立矩阵后还有其他更好的方法来知道最接近的匹配项?
Could you elaborate and make your question a little clear ? 您能详细说明一下您的问题吗?
What doe you mean by Similarity beetwen 2 list ? “相似性beetwen 2列表”是什么意思?
When you say List, you mean CSV/Excel List or Python list . 说清单时,是指CSV / Excel清单或Python清单。 If you are looking at distance beetwen the string you might have to look at Levenshtein Algorithm . 如果您正在查看距离beetwen字符串,则可能必须查看Levenshtein算法。 https://www.geeksforgeeks.org/edit-distance-dp-5/ https://www.geeksforgeeks.org/edit-distance-dp-5/
Pythonic - https://www.python-course.eu/levenshtein_distance.php . Pythonic- https: //www.python-course.eu/levenshtein_distance.php
Since your data size if humongous , Alsp Check external merge sort strategy 由于您的数据量很大,因此Alsp Check外部合并排序策略
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.