比较两个列表中常见项目的最快方法

Question

我有两个这样的列表：

listt = [["a","abc","zzz","xxx","abc","abc"],["yyy","ggg","abc","cccc"]]

我有另一个这样的查询列表：

queryList = ["abc","cccc","abc","yyy"]

queryList & listt[0]共有 2 个"abc" 。

queryList和listt[1]共有 1 个"abc" ， 1 个"cccc"和 1 个"yyy" 。

所以我想要一个像这样的 output ：

[2,3] #2 = Total common items between queryList & listt[0]
      #3 = Total common items between queryList & listt[1]

我目前正在使用循环来执行此操作，但这似乎很慢。 我将有数百万个列表，每个列表有数千个项目。

listt = [["a","abc","zzz","xxx","abc","abc"],["yyy","ggg","abc","cccc"]]
queryList = ["abc","cccc","abc","yyy"]

totalMatch = []
for hashtree in listt:
    matches = 0
    tempQueryHash = queryList.copy()
    for hash in hashtree:
        for i in range(len(tempQueryHash)):
            if tempQueryHash[i]==hash:
                matches +=1
                tempQueryHash[i] = "" #Don't Match the same block twice.
                break

    totalMatch.append(matches)
print(totalMatch)

Answer 1

好吧，我仍在学习 Python 中的技巧。 但是根据这个较早的帖子，应该可以使用以下内容：

from collections import Counter
listt = [["a","abc","zzz","xxx","abc","abc"],["yyy","ggg","abc","cccc"]]
queryList = ["abc","cccc","abc","yyy"]
OutputList = [len(list((Counter(x) & Counter(queryList)).elements())) for x in listt]
# [2, 3]

我会留意其他方法...

Answer 2

JvdV答案的改进。

基本上对值求和而不是对元素进行计数，并且还缓存 queryListCounter。

from collections import Counter
listt = [["a","abc","zzz","xxx","abc","abc"],["yyy","ggg","abc","cccc"]]
queryList = ["abc","cccc","abc","yyy"]
queryListCounter = Counter(queryList)
OutputList = [sum((Counter(x) & queryListCounter).values()) for x in listt]

Answer 3

您可以列出 listt 和 queryList 的匹配项并计算匹配的数量。

output = ([i == z for i in listt[1] for z in queryList])
print(output.count(True))

比较两个列表中常见项目的最快方法

问题描述

3 个解决方案

解决方案1
2 已采纳 2020-04-10 14:25:46

解决方案2
2 2020-04-10 14:44:08

解决方案3
0 2020-04-10 14:36:48

比较两个列表中常见项目的最快方法

问题描述

3 个解决方案

解决方案1 2 已采纳 2020-04-10 14:25:46

解决方案2 2 2020-04-10 14:44:08

解决方案3 0 2020-04-10 14:36:48

解决方案1
2 已采纳 2020-04-10 14:25:46

解决方案2
2 2020-04-10 14:44:08

解决方案3
0 2020-04-10 14:36:48