查找每列中出现次数最多的元素的最简单方法

Question

假设我有

data =
[[a, a, c],
 [b, c, c],
 [c, b, b],
 [b, a, c]]

我想得到一个包含每个列中出现次数最多的元素的列表： result = [b, a, c] ，最简单的方法是什么？

我使用Python 2.6.6

Answer 1

在统计中，您想要的称为mode 。 scipy库（ http://www.scipy.org/ ）在scipy.stats具有mode功能。

In [32]: import numpy as np

In [33]: from scipy.stats import mode

In [34]: data = np.random.randint(1,6, size=(6,8))

In [35]: data
Out[35]: 
array([[2, 1, 5, 5, 3, 3, 1, 4],
       [5, 3, 2, 2, 5, 2, 5, 3],
       [2, 2, 5, 3, 3, 2, 1, 1],
       [2, 4, 1, 5, 4, 4, 4, 5],
       [4, 4, 5, 5, 2, 4, 4, 4],
       [2, 4, 1, 1, 3, 3, 1, 3]])

In [36]: val, count = mode(data, axis=0)

In [37]: val
Out[37]: array([[ 2.,  4.,  5.,  5.,  3.,  2.,  1.,  3.]])

In [38]: count
Out[38]: array([[ 4.,  3.,  3.,  3.,  3.,  2.,  3.,  2.]])

Answer 2

使用列表理解加上collections.Counter() ：

from collections import Counter

[Counter(col).most_common(1)[0][0] for col in zip(*data)]

zip(*data)将您的列表列表重新排列为列列表。 Counter()对象计算输入序列中任何东西出现的频率， .most_common(1)为我们提供最受欢迎的元素（加上它的计数）。

如果您输入的是单个字符串，则可以得到：

>>> [Counter(col).most_common(1)[0][0] for col in zip(*data)]
['b', 'a', 'c']

Answer 3

数据可以散列吗？ 如果是这样，则collections.Counter会有所帮助：

[Counter(col).most_common(1)[0][0] for col in zip(*data)]

之所以起作用，是因为zip(*data)转置输入数据，每次产生1列。 然后，计数器对元素进行计数，并将计数存储在字典中，并将计数作为值。 Counters还有一个most_common方法，该方法返回一个计数最高（从最大计数到最小计数）的“ N”项列表。 因此，您想要获得most_common返回的列表中第一项的第一个元素，这是[0][0]来源。

例如

>>> a,b,c = 'abc'
>>> from collections import Counter
>>> data = [[a, a, c],
...  [b, c, c],
...  [c, b, b],
...  [b, a, c]]
>>> [Counter(col).most_common(1)[0][0] for col in zip(*data)]
['b', 'a', 'c']

Answer 4

这是不使用collections模块的解决方案

def get_most_common(data):

    data = zip(*data)
    count_dict = {}
    common = []
    for col in data:
        for val in col:
            count_dict[val] = count_dict.get(val, 0) + 1
        max_count = max([count_dict[key] for key in count_dict])
        common.append(filter(lambda k: count_dict[k] == max_count, count_dict))

    return common

if __name__ == "__main__":

    data = [['a','a','b'],
            ['b','c','c'],
            ['a','b','b'],
            ['b','a','c']]

    print get_most_common(data)

查找每列中出现次数最多的元素的最简单方法

问题描述

4 个解决方案

解决方案1
4 已采纳 2013-03-21 20:04:22

解决方案2
3 2013-03-21 17:53:24

解决方案3
3 2013-03-21 17:56:28

解决方案4
0 2013-03-21 18:21:40

查找每列中出现次数最多的元素的最简单方法

问题描述

4 个解决方案

解决方案1 4 已采纳 2013-03-21 20:04:22

解决方案2 3 2013-03-21 17:53:24

解决方案3 3 2013-03-21 17:56:28

解决方案4 0 2013-03-21 18:21:40

解决方案1
4 已采纳 2013-03-21 20:04:22

解决方案2
3 2013-03-21 17:53:24

解决方案3
3 2013-03-21 17:56:28

解决方案4
0 2013-03-21 18:21:40