简体   繁体   中英

Get common elements majority of lists in python

Given 4 lists, I want to get elements that are common to 3 or more lists.

a = [1, 2, 3, 4]
b = [1, 2, 3, 4, 5]
c = [1, 3, 4, 5, 6]
d = [1, 2, 6, 7]

Hence, the output should be [1, 2, 3, 4] .

My current code is as follows.

result1 = set(a) & set(b) & set(c)
result2 = set(b) & set(c) & set(d)
result3 = set(c) & set(d) & set(a)
result4 = set(d) & set(a) & set(b)

final_result = list(result1)+list(result2)+list(result3)+list(result4)
print(set(final_result))

It works fine, and give the desired output. However, I am interested in knowing if there is an easy way of doing this in Python, ie: are there any built in functions for this?

Using a Counter , you can do this like:

Code:

a = [1, 2, 3, 4]
b = [1, 2, 3, 4, 5]
c = [1, 3, 4, 5, 6]
d = [1, 2, 6, 7]

from collections import Counter

counts = Counter(sum(([list(set(i)) for i in (a, b, c, d)]), []))
print(counts)

more_than_three = [i for i, c in counts.items() if c >= 3]
print(more_than_three)

Results:

Counter({1: 4, 2: 3, 3: 3, 4: 3, 5: 2, 6: 2, 7: 1})

[1, 2, 3, 4]

Iterate over the values in all lists to create a dict of {value: number_of_lists_the_value_appears_in} :

from collections import defaultdict

counts = defaultdict(int)
for list_ in (a, b, c, d):
    for value in set(list_):  # eliminate duplicate values with `set`
        counts[value] += 1

Then in the second step remove all values with a count < 3 :

result = [value for value, count in counts.items() if count >= 3]

print(result)  # [1, 2, 3, 4]

The code below will solve the generalised problem (with n lists, and a requirement that a common element must be in at least k of them). It will work with non-hashable items, which is the main disadvantage of all the other answers:

a = [1, 2, 3, 4]
b = [1, 2, 3, 4, 5]
c = [1, 2, 3, 4, 4, 5, 6]
d = [1, 2, 6, 7]


lists = [a, b, c, d]
result = []
desired_quanity = 3

for i in range(len(lists) - desired_quanity + 1):   #see point 1 below
    sublist = lists.pop(0)                          #see point 2
    for item in sublist:
        counter = 1   #1 not 0, by virute of the fact it is in sublist
        for comparisonlist in lists:
            if item in comparisonlist:
                counter += 1
                comparisonlist.remove(item)         #see point 3
        if counter >= desired_quanity:   
            result.append(item)

This has the disadvantage that for each element in every list, we have to check in every other list to see if it is there, but we can make things more efficient in a few ways. Also look-ups are alot slower in lists than sets (which we can't use since the OP has non-hashable items in the lists), and so this may be slow for very large lists.

1) If we require an item to be in k lists, we don't need to check each item in the last k-1 lists, as we would have already picked it up whilst searching through the first k lists.

2) Once we have searched through a list, we can discard that list, since any items in the just-searched-list that might contribute to our final result, will again already have been dealt with. This means that with each iteration we have fewer lists to search through.

3) When we have checked if an item is in enough lists, we can remove that item from the list, which means not only is the number of lists getting shorter as we proceed, the lists themselves are getting shorter, meaning quicker lookups.

As an aftersort, if we the original lists happen to be sorted beforehand, this might also help this algorithm work efficiently.

创建计数字典并过滤掉小于3的计数字典

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM