简体   繁体   中英

Fastest way to compare common items in two lists

I have two list like this:

listt = [["a","abc","zzz","xxx","abc","abc"],["yyy","ggg","abc","cccc"]]

I have another queryList like this:

queryList = ["abc","cccc","abc","yyy"]

queryList & listt[0] contain 2 "abc" in common.

queryList & listt[1] contain 1 "abc" , 1 "cccc" & 1 "yyy" in common.

So I want an output like this:

[2,3] #2 = Total common items between queryList & listt[0]
      #3 = Total common items between queryList & listt[1]

I am currently using loops to do this, but this seems to be slow. I will have millions of lists, with thousands of items per list.

listt = [["a","abc","zzz","xxx","abc","abc"],["yyy","ggg","abc","cccc"]]
queryList = ["abc","cccc","abc","yyy"]

totalMatch = []
for hashtree in listt:
    matches = 0
    tempQueryHash = queryList.copy()
    for hash in hashtree:
        for i in range(len(tempQueryHash)):
            if tempQueryHash[i]==hash:
                matches +=1
                tempQueryHash[i] = "" #Don't Match the same block twice.
                break

    totalMatch.append(matches)
print(totalMatch)

Well, I'm still learning the ropes within Python. But according to this older post on so, something like the following should work:

from collections import Counter
listt = [["a","abc","zzz","xxx","abc","abc"],["yyy","ggg","abc","cccc"]]
queryList = ["abc","cccc","abc","yyy"]
OutputList = [len(list((Counter(x) & Counter(queryList)).elements())) for x in listt]
# [2, 3]

I'll keep a lookout for some other method...

Improvement from JvdV answer.

Basically sum the values instead of counting the elements and also cache the queryListCounter.

from collections import Counter
listt = [["a","abc","zzz","xxx","abc","abc"],["yyy","ggg","abc","cccc"]]
queryList = ["abc","cccc","abc","yyy"]
queryListCounter = Counter(queryList)
OutputList = [sum((Counter(x) & queryListCounter).values()) for x in listt]

You can list the matches of listt and queryList and count the number of matches made.

output = ([i == z for i in listt[1] for z in queryList])
print(output.count(True))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM