[英]Picking the most common element from a bunch of lists
I have a list l of lists [l1, ..., ln]
of equal length 我有长度相等的列表
[l1, ..., ln]
的列表l
I want to compare the l1[k], l2[k], ..., ln[k]
for all k
in len(l1)
and make another list l0
by picking the element that appears most frequently. 我想比较
len(l1)
所有k
的l1[k], l2[k], ..., ln[k]
,并通过选择出现频率最高的元素制作另一个列表l0
。
So, if l1 = [1, 2, 3]
, l2 = [1, 4, 4]
and l3 = [0, 2, 4]
, then l = [1, 2, 4]
. 因此,如果
l1 = [1, 2, 3]
, l2 = [1, 4, 4]
l3 = [0, 2, 4]
l2 = [1, 4, 4]
和l3 = [0, 2, 4]
,则l = [1, 2, 4]
。 If there is a tie, I will look at the lists that make up the tie and choose the one in the list with higher priority. 如果有平局,我将查看构成平局的列表,并在列表中选择优先级更高的一个。 Priority is given a priori, each list is given a priority.
优先级被赋予优先级,每个列表被赋予优先级。 Ex.
例如 if you have value 1 in lists
l1
and l3
, and value 2 in lists l2
and l4
, and 3 in l5
, and lists are ordered according to priority, say l5>l2>l3>l1>l4
, then I will pick 2, because 2 is in l2
that contains an element with highest occurrence and its priority is higher than l1
and l3
. 如果您在列表
l1
和l3
具有值1,在列表l2
和l4
值2,并且在l5
具有值3,并且列表是根据优先级排序的,例如l5>l2>l3>l1>l4
,那么我将选择2,因为2在l2
中包含出现次数最高的元素,并且其优先级高于l1
和l3
。
How do I do this in python without creating a for loop with lots of if/else conditions? 如何在python中执行此操作而不创建带有很多if / else条件的for循环?
You can use the Counter module from the collections library. 您可以使用集合库中的“计数器”模块。 Using the
map
function will reduce your list looping. 使用
map
功能将减少列表循环。 You will need an if/else statement for the case that there is no most frequent value but only for that: 对于没有最频繁值的情况,您仅需要一个if / else语句:
import collections
list0 = []
list_length = len(your_lists[0])
for k in list_length:
k_vals = map(lambda x: x[k], your_lists) #collect all values at k pos
counts = collections.Counter(k_vals).most_common() #tuples (val,ct) sorted by count
if counts[0][1] > counts[1][1]: #is there a most common value
list0.append(counts[0][0]) #takes the value with highest count
else:
list0.append(k_vals[0]) #takes element from first list
list0
is the answer you are looking for. list0
是您正在寻找的答案。 I just hate using l
because it's easy to confuse with the number 1
我只是讨厌使用
l
因为它很容易与数字1
混淆
Edit (based on comments): 编辑 (基于评论):
Incorporating your comments, instead of the if/else statement, use a while loop: 合并您的注释,而不是if / else语句,请使用while循环:
i = list_length
while counts[0][1] == counts[1][1]:
counts = collections.Counter(k_vals[:i]).most_common() #ignore the lowest priority element
i -= 1 #go back farther if there's still a tie
list0.append(counts[0][0]) #takes the value with highest count once there's no tie
So the whole thing is now: 所以整个事情就变成了:
import collections
list0 = []
list_length = len(your_lists[0])
for k in list_length:
k_vals = map(lambda x: x[k], your_lists) #collect all values at k pos
counts = collections.Counter(k_vals).most_common() #tuples (val,ct) sorted by count
i = list_length
while counts[0][1] == counts[1][1]: #in case of a tie
counts = collections.Counter(k_vals[:i]).most_common() #ignore the lowest priority element
i -= 1 #go back farther if there's still a tie
list0.append(counts[0][0]) #takes the value with highest count
You throw in one more tiny loop but on the bright side there's no if/else statements at all! 您又抛出了一个小循环,但好的一面是根本没有if / else语句!
Just transpose the sublists and get the Counter.most_common
element key from each group: 只需转置子列表并从每个组中获取
Counter.most_common
元素键:
from collections import Counter
lists = [[1, 2, 3],[1, 4, 4],[0, 2, 4]]
print([Counter(sub).most_common(1)[0][0] for sub in zip(*lists)])
If they are individual lists just zip those: 如果它们是单独的列表,请压缩它们:
l1, l2, l3 = [1, 2, 3], [1, 4, 4], [0, 2, 4]
print([Counter(sub).most_common(1)[0][0] for sub in zip(l1,l2,l3)])
Not sure how taking the first element from the grouping if there is a tie makes sense as it may not be the one that tied but that is trivial to implement, just get the two most_common and check if their counts are equal: 不确定如果有平局,从分组中取出第一个元素是有道理的,因为它可能不是平局的,但是实现起来很简单,只需获取两个most_common并检查它们的计数是否相等:
def most_cm(lists):
for sub in zip(*lists):
# get two most frequent
comm = Counter(sub).most_common(2)
# if their values are equal just return the ele from l1
yield comm[0][0] if len(comm) == 1 or comm[0][1] != comm[1][1] else sub[0]
We also need if len(comm) == 1
in case all the elements are the same or we will get an IndexError. if len(comm) == 1
所有元素都相同,我们还需要if len(comm) == 1
,否则我们将得到IndexError。
If you are talking about taking the element that comes from the earlier list in the event of a tie ie l2 comes before l5 then that is just the same as taking any of the elements that tie. 如果您要讨论的是在出现平局时采用来自较早列表的元素,即l2在l5之前,则与采用任何平局的元素相同。
For a decent number of sublists: 对于相当数量的子列表:
In [61]: lis = [[randint(1,10000) for _ in range(10)] for _ in range(100000)]
In [62]: list(most_cm(lis))
Out[62]: [5856, 9104, 1245, 4304, 829, 8214, 9496, 9182, 8233, 7482]
In [63]: timeit list(most_cm(lis))
1 loops, best of 3: 249 ms per loop
Solution is: 解决方法是:
a = [1, 2, 3]
b = [1, 4, 4]
c = [0, 2, 4]
print [max(set(element), key=element.count) for element in zip(a, b, c)]
That's what you're looking for: 这就是您要寻找的:
from collections import Counter
from operator import itemgetter
l0 = [max(Counter(li).items(), key=itemgetter(1))[0] for li in zip(*l)]
If you are OK taking any one of a set of elements that are tied as most common, and you can guarantee that you won't hit an empty list within your list of lists, then here is a way using Counter
(so, from collections import Counter
): 如果您可以接受最常见的一组元素中的任何一个,并且可以保证不会在列表列表中打空列表,那么可以使用
Counter
(因此, from collections import Counter
):
l = [ [1, 0, 2, 3, 4, 7, 8],
[2, 0, 2, 1, 0, 7, 1],
[2, 0, 1, 4, 0, 1, 8]]
res = []
for k in range(len(l[0])):
res.append(Counter(lst[k] for lst in l).most_common()[0][0])
Doing this in IPython and printing the result: 在IPython中执行此操作并打印结果:
In [86]: res
Out[86]: [2, 0, 2, 1, 0, 7, 8]
Try this: 尝试这个:
l1 = [1,2,3]
l2 = [1,4,4]
l3 = [0,2,4]
lists = [l1, l2, l3]
print [max(set(x), key=x.count) for x in zip(*lists)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.