从一堆清单中选择最常见的元素

Question

我有长度相等的列表[l1, ..., ln]的列表l

我想比较len(l1)所有k的l1[k], l2[k], ..., ln[k] ，并通过选择出现频率最高的元素制作另一个列表l0 。

因此，如果l1 = [1, 2, 3] ， l2 = [1, 4, 4] l3 = [0, 2, 4] l2 = [1, 4, 4]和l3 = [0, 2, 4] ，则l = [1, 2, 4] 。 如果有平局，我将查看构成平局的列表，并在列表中选择优先级更高的一个。 优先级被赋予优先级，每个列表被赋予优先级。 例如如果您在列表l1和l3具有值1，在列表l2和l4值2，并且在l5具有值3，并且列表是根据优先级排序的，例如l5>l2>l3>l1>l4 ，那么我将选择2，因为2在l2中包含出现次数最高的元素，并且其优先级高于l1和l3 。

如何在python中执行此操作而不创建带有很多if / else条件的for循环？

Answer 1

您可以使用集合库中的“计数器”模块。 使用map功能将减少列表循环。 对于没有最频繁值的情况，您仅需要一个if / else语句：

import collections

list0 = []
list_length = len(your_lists[0])
for k in list_length:
    k_vals = map(lambda x: x[k], your_lists) #collect all values at k pos
    counts = collections.Counter(k_vals).most_common() #tuples (val,ct) sorted by count
    if counts[0][1] > counts[1][1]: #is there a most common value
        list0.append(counts[0][0]) #takes the value with highest count
    else:
        list0.append(k_vals[0]) #takes element from first list

list0是您正在寻找的答案。 我只是讨厌使用l因为它很容易与数字1混淆

编辑（基于评论）：
合并您的注释，而不是if / else语句，请使用while循环：

i = list_length
while counts[0][1] == counts[1][1]:
    counts = collections.Counter(k_vals[:i]).most_common() #ignore the lowest priority element
    i -= 1 #go back farther if there's still a tie
list0.append(counts[0][0]) #takes the value with highest count once there's no tie

所以整个事情就变成了：

import collections

list0 = []
list_length = len(your_lists[0])
for k in list_length:
    k_vals = map(lambda x: x[k], your_lists) #collect all values at k pos
    counts = collections.Counter(k_vals).most_common() #tuples (val,ct) sorted by count
    i = list_length
    while counts[0][1] == counts[1][1]: #in case of a tie
        counts = collections.Counter(k_vals[:i]).most_common() #ignore the lowest priority element
        i -= 1 #go back farther if there's still a tie
    list0.append(counts[0][0]) #takes the value with highest count

您又抛出了一个小循环，但好的一面是根本没有if / else语句！

Answer 2

只需转置子列表并从每个组中获取Counter.most_common元素键：

from collections import Counter


lists = [[1, 2, 3],[1, 4, 4],[0, 2, 4]]

print([Counter(sub).most_common(1)[0][0] for sub in zip(*lists)])

如果它们是单独的列表，请压缩它们：

l1, l2, l3 = [1, 2, 3], [1, 4, 4], [0, 2, 4]

print([Counter(sub).most_common(1)[0][0] for sub in zip(l1,l2,l3)])

不确定如果有平局，从分组中取出第一个元素是有道理的，因为它可能不是平局的，但是实现起来很简单，只需获取两个most_common并检查它们的计数是否相等：

def most_cm(lists):
    for sub in zip(*lists):      
        # get two most frequent 
        comm = Counter(sub).most_common(2)
        # if their values are equal just return the ele from l1
        yield comm[0][0] if len(comm) == 1 or comm[0][1] != comm[1][1] else sub[0]

if len(comm) == 1所有元素都相同，我们还需要if len(comm) == 1 ，否则我们将得到IndexError。

如果您要讨论的是在出现平局时采用来自较早列表的元素，即l2在l5之前，则与采用任何平局的元素相同。

对于相当数量的子列表：

In [61]: lis = [[randint(1,10000) for _ in range(10)] for _ in range(100000)]

In [62]: list(most_cm(lis))
Out[62]: [5856, 9104, 1245, 4304, 829, 8214, 9496, 9182, 8233, 7482]

In [63]: timeit list(most_cm(lis))
1 loops, best of 3: 249 ms per loop

Answer 3

解决方法是：

a = [1, 2, 3]
b = [1, 4, 4]
c = [0, 2, 4]

print [max(set(element), key=element.count) for element in zip(a, b, c)]

Answer 4

这就是您要寻找的：

from collections import Counter
from operator import itemgetter

l0 = [max(Counter(li).items(), key=itemgetter(1))[0] for li in zip(*l)]

Answer 5

如果您可以接受最常见的一组元素中的任何一个，并且可以保证不会在列表列表中打空列表，那么可以使用Counter （因此， from collections import Counter ）：

l = [ [1, 0, 2, 3, 4, 7, 8],
      [2, 0, 2, 1, 0, 7, 1],
      [2, 0, 1, 4, 0, 1, 8]]

res = []

for k in range(len(l[0])):
    res.append(Counter(lst[k] for lst in l).most_common()[0][0])

在IPython中执行此操作并打印结果：

In [86]: res
Out[86]: [2, 0, 2, 1, 0, 7, 8]

Answer 6

尝试这个：

l1 = [1,2,3]
l2 = [1,4,4]
l3 = [0,2,4]

lists = [l1, l2, l3]

print [max(set(x), key=x.count) for x in zip(*lists)]

从一堆清单中选择最常见的元素

问题描述

6 个解决方案

解决方案1
4 2016-03-22 21:46:35

解决方案2
3 2016-03-22 21:20:17

解决方案3
2 2016-03-22 21:16:17

解决方案4
2 2016-03-22 21:16:42

解决方案5
2 2016-03-22 21:17:27

解决方案6
1 2016-03-22 21:16:36

从一堆清单中选择最常见的元素

问题描述

6 个解决方案

解决方案1 4 2016-03-22 21:46:35

解决方案2 3 2016-03-22 21:20:17

解决方案3 2 2016-03-22 21:16:17

解决方案4 2 2016-03-22 21:16:42

解决方案5 2 2016-03-22 21:17:27

解决方案6 1 2016-03-22 21:16:36

解决方案1
4 2016-03-22 21:46:35

解决方案2
3 2016-03-22 21:20:17

解决方案3
2 2016-03-22 21:16:17

解决方案4
2 2016-03-22 21:16:42

解决方案5
2 2016-03-22 21:17:27

解决方案6
1 2016-03-22 21:16:36