比较字符串Python的最快方法

Question

Situation: Comparing strings in fileA with pre-defined strings in fileB. 情况：将fileA中的字符串与fileB中的预定义字符串进行比较。 Example of said function in my code: 我的代码中所述函数的示例：

string = open('fileA', 'r')
stringlist = open('fileB', 'r')

//compare the strings
for i in string:
    for j in stringlist:
        if i == j:
            print("Same String found!" + i + " " + j)

Problem: In my actual program, string contains more than 200 strings, while stringlist is a file with more than 50,000 strings. 问题：在我实际的程序， string包含超过200个字符串，而stringlist是超过50000个字符串的文件。 The nested for loop, as I have read, is slow as a comparison function. 如我所读，嵌套的for循环作为比较函数很慢。

Question: What is the fastest way to compare the two files' content? 问：比较两个文件内容的最快方法是什么？

Additional information 1: Both files are CSV files, and are opened in my program as CSV-delimited. 附加信息1：这两个文件都是CSV文件，并且在我的程序中以CSV分隔符打开。

Additional information 2: Strings are md5 hashes (32 characters). 附加信息2：字符串是md5哈希（32个字符）。

Additional information 3: I am open to other ways to store the strings, ie Compare the strings on-the-fly instead of saving it to fileA. 附加信息3：我愿意采用其他方式来存储字符串，即即时比较字符串，而不是将其保存到fileA。

Additional information 4: I am also open to other methods or modules that I can use (ie: Threading/parallel processing) -- speed is the key here. 附加信息4：我也开放我可以使用的其他方法或模块（即：线程/并行处理）-速度是这里的关键。

Answer 1

If you are okay with not printing duplicates, using set.intersection should be really fast: 如果可以不打印重复项，则使用set.intersection应该非常快：

list1 = ["hello", "world", "foo"]
list2 = ["foo", "bar", "baz"]

set(list1).intersection(list2)
# {'foo'}

Answer 2

You should use sets : 您应该使用集合：

setA = set(listA)
setB = set(listB)
common = setA.intersection(setB)

common now holds all the strings that are present in both lists 现在， common保留了两个列表中都存在的所有字符串

You can also do this with a one-liner: 您也可以使用单线执行此操作：

common = set(listA).intersection(set(listB))

If you can do this comparison "on the fly" it is of course better and faster than saving the lists to a file and then reading again from that file, you gain nothing by doing that. 如果您可以“即时”进行比较，那当然比将列表保存到文件然后再从该文件中读取更好，更快。

And of course, to print duplicates: 当然，要打印副本：

for x in common:
    print(x)

比较字符串Python的最快方法

问题描述

2 个解决方案

解决方案1
3 2017-06-07 04:37:52

解决方案2
2 已采纳 2017-06-07 04:40:17

比较字符串Python的最快方法

问题描述

2 个解决方案

解决方案1 3 2017-06-07 04:37:52

解决方案2 2 已采纳 2017-06-07 04:40:17

解决方案1
3 2017-06-07 04:37:52

解决方案2
2 已采纳 2017-06-07 04:40:17