简体   繁体   中英

Picking the most common element from a bunch of lists

I have a list l of lists [l1, ..., ln] of equal length

I want to compare the l1[k], l2[k], ..., ln[k] for all k in len(l1) and make another list l0 by picking the element that appears most frequently.

So, if l1 = [1, 2, 3] , l2 = [1, 4, 4] and l3 = [0, 2, 4] , then l = [1, 2, 4] . If there is a tie, I will look at the lists that make up the tie and choose the one in the list with higher priority. Priority is given a priori, each list is given a priority. Ex. if you have value 1 in lists l1 and l3 , and value 2 in lists l2 and l4 , and 3 in l5 , and lists are ordered according to priority, say l5>l2>l3>l1>l4 , then I will pick 2, because 2 is in l2 that contains an element with highest occurrence and its priority is higher than l1 and l3 .

How do I do this in python without creating a for loop with lots of if/else conditions?

You can use the Counter module from the collections library. Using the map function will reduce your list looping. You will need an if/else statement for the case that there is no most frequent value but only for that:

import collections

list0 = []
list_length = len(your_lists[0])
for k in list_length:
    k_vals = map(lambda x: x[k], your_lists) #collect all values at k pos
    counts = collections.Counter(k_vals).most_common() #tuples (val,ct) sorted by count
    if counts[0][1] > counts[1][1]: #is there a most common value
        list0.append(counts[0][0]) #takes the value with highest count
    else:
        list0.append(k_vals[0]) #takes element from first list

list0 is the answer you are looking for. I just hate using l because it's easy to confuse with the number 1

Edit (based on comments):
Incorporating your comments, instead of the if/else statement, use a while loop:

i = list_length
while counts[0][1] == counts[1][1]:
    counts = collections.Counter(k_vals[:i]).most_common() #ignore the lowest priority element
    i -= 1 #go back farther if there's still a tie
list0.append(counts[0][0]) #takes the value with highest count once there's no tie

So the whole thing is now:

import collections

list0 = []
list_length = len(your_lists[0])
for k in list_length:
    k_vals = map(lambda x: x[k], your_lists) #collect all values at k pos
    counts = collections.Counter(k_vals).most_common() #tuples (val,ct) sorted by count
    i = list_length
    while counts[0][1] == counts[1][1]: #in case of a tie
        counts = collections.Counter(k_vals[:i]).most_common() #ignore the lowest priority element
        i -= 1 #go back farther if there's still a tie
    list0.append(counts[0][0]) #takes the value with highest count

You throw in one more tiny loop but on the bright side there's no if/else statements at all!

Just transpose the sublists and get the Counter.most_common element key from each group:

from collections import Counter


lists = [[1, 2, 3],[1, 4, 4],[0, 2, 4]]

print([Counter(sub).most_common(1)[0][0] for sub in zip(*lists)])

If they are individual lists just zip those:

l1, l2, l3 = [1, 2, 3], [1, 4, 4], [0, 2, 4]

print([Counter(sub).most_common(1)[0][0] for sub in zip(l1,l2,l3)])

Not sure how taking the first element from the grouping if there is a tie makes sense as it may not be the one that tied but that is trivial to implement, just get the two most_common and check if their counts are equal:

def most_cm(lists):
    for sub in zip(*lists):      
        # get two most frequent 
        comm = Counter(sub).most_common(2)
        # if their values are equal just return the ele from l1
        yield comm[0][0] if len(comm) == 1 or comm[0][1] != comm[1][1] else sub[0]

We also need if len(comm) == 1 in case all the elements are the same or we will get an IndexError.

If you are talking about taking the element that comes from the earlier list in the event of a tie ie l2 comes before l5 then that is just the same as taking any of the elements that tie.

For a decent number of sublists:

In [61]: lis = [[randint(1,10000) for _ in range(10)] for _ in range(100000)]

In [62]: list(most_cm(lis))
Out[62]: [5856, 9104, 1245, 4304, 829, 8214, 9496, 9182, 8233, 7482]

In [63]: timeit list(most_cm(lis))
1 loops, best of 3: 249 ms per loop

Solution is:

a = [1, 2, 3]
b = [1, 4, 4]
c = [0, 2, 4]

print [max(set(element), key=element.count) for element in zip(a, b, c)]

That's what you're looking for:

from collections import Counter
from operator import itemgetter

l0 = [max(Counter(li).items(), key=itemgetter(1))[0] for li in zip(*l)]

If you are OK taking any one of a set of elements that are tied as most common, and you can guarantee that you won't hit an empty list within your list of lists, then here is a way using Counter (so, from collections import Counter ):

l = [ [1, 0, 2, 3, 4, 7, 8],
      [2, 0, 2, 1, 0, 7, 1],
      [2, 0, 1, 4, 0, 1, 8]]

res = []

for k in range(len(l[0])):
    res.append(Counter(lst[k] for lst in l).most_common()[0][0])

Doing this in IPython and printing the result:

In [86]: res
Out[86]: [2, 0, 2, 1, 0, 7, 8]

Try this:

l1 = [1,2,3]
l2 = [1,4,4]
l3 = [0,2,4]

lists = [l1, l2, l3]

print [max(set(x), key=x.count) for x in zip(*lists)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM