如何从数组中获取表示最多的对象

Question

我有一个包含一些对象的数组，并且有几个相似的对象。 例如：水果= [苹果，橙子，苹果，香蕉，香蕉，橙子，苹果，苹果]

从这个数组中获取代表性最高的对象的最有效方法是什么？ 在这种情况下，它将是“苹果”，但你会如何以有效的方式出去计算？

Answer 1

不要重新发明轮子。 在Python 2.7+中，您可以使用Counter类：

import collections
fruit=['apple', 'orange', 'apple', 'banana', 'banana', 'orange', 'apple', 'apple']
c=collections.Counter(fruit)
print(c.most_common(1))
# [('apple', 4)]

如果您使用的是旧版本的Python，则可以在此处下载Counter 。

虽然知道如何自己实现这样的事情是很好的，但是习惯使用Counter也是一个好主意，因为它是（或将成为）标准库的一部分。

Answer 2

如果对象是可清洗的，那么您可以使用dict来存储计数：

results = {}
for item in somelist:
  if item not in results:
    results[item] = 1
  else
    results[item] += 1

print max(results.iteritems(), key=operator.itemgetter(1))

Answer 3

保留每个对象出现频率的字典。

构建此表，遍历列表一次。 随着时间的推移，跟踪到目前为止最常出现的对象。

此代码未经测试。

from collections import defaultdict

def mode(objects):
    h = defaultdict(int)
    max_f = 0
    max_obj = None
    for o in objects:
        f = h[o] = h[o] + 1
        if f > max_f:
            max_f = f
            max_obj = o
    return max_obj

如果对象不可散列，则可以改为散列它们的一些独特特征，例如id(o) 。

Answer 4

你想要一个有效的方法。 显然，它可能在O（n）时间内，因此任何需要对列表进行排序的方法都是O（n log（n））。 不可能比O（n）更快地做到这一点，因为即使你检查了前n / 2-1个元素，并且它们都是“苹果”，你也不知道其余的元素不会是香蕉。

因此，考虑到我们正在寻找O（n），您必须遍历列表并计算您看到的每种类型的项目数。

defaultdict是在实践中实现这一点的简单方法。

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for i in ['apple', 'banana', 'apple']:
...    d[i] += 1
...
>>> d
defaultdict(<type 'int'>, {'apple': 2, 'banana': 1})

Answer 5

你希望在这里实现的最佳时间是O（n） - 你总是需要至少走一次整个阵列。 最简单的方法当然是构建直方图。 如果您的字典结构（某种类型的地图）提供O（1）插入和检索，那么这就像（groovy-ish伪代码）一样简单：

def histogram = new HashMap()
def maxObj = null
def maxObjCount = 0
objectList.each {
    if(histogram.contains(it)) histogram.put(it, histogram.get(it)+1)
    else histogram.put(it, 1)

    if(histogram.get(it) > maxObjCount) {
        maxObj = it
        maxObjCount = histogram.get(it)
    }
}

Answer 6

def count_reps(item, agg):
  k = hash(item)
  try:
    agg[k] += 1
  except KeyError:
    agg[k] = 1
  return agg

item_dict = reduce(your_array, {})

item_dict将包含计数，然后您可以评估每个对象的受欢迎程度。

Answer 7

这是一种不同的方法，它基本上对列表进行排序，然后按排序顺序处理它。

fruits = ['apple', 'orange', 'apple', 'banana', 'banana', 'orange', 'apple', 'apple']

max_fruit_count = 0
max_fruit = ''
current_fruit_count = 0
current_fruit = ''
for fruit in sorted(fruits) :
    if fruit != current_fruit :
        if current_fruit != max_fruit :
            if current_fruit_count > max_fruit_count :
                max_fruit = current_fruit
                max_fruit_count = current_fruit_count
        current_fruit = fruit
        current_fruit_count = 1
    else :
        current_fruit_count += 1

if current_fruit_count > max_fruit_count :
    max_fruit = current_fruit
    max_fruit_count = current_fruit_count

print max_fruit, max_fruit_count

Answer 8

这不是O（n），而是O（n ^ 2），因此它不适合您的账单作为“最有效的方式”，但它是紧凑的并且避免for循环，这在Python中相当慢。 它将比O（n）选项快11个独特的项目。

def most_common(items):
    s = set(items)
    return max([(items.count(i), i) for i in s])[1]

Answer 9

正如~unutbu所说：使用collections.Counter失败，为你的代码计时。 这是我（可能效率低下）的方法：

python -m timeit -s "fruit = ['apple']*4 + ['banana'] + ['orange']*2" \
"kL = set(fruit);  L = [fruit.count(f) for f in kL];  D = dict(zip(kL,L)); \
sorted(D,key = lambda k: D[k],reverse=True)" 
100000 loops, best of 3: 10.1 usec per loop

如何从数组中获取表示最多的对象

问题描述

9 个解决方案

解决方案1
8 已采纳 2010-02-02 15:41:05

解决方案2
5 2010-02-02 13:36:34

解决方案3
3 2010-02-02 13:38:23

解决方案4
2 2010-02-02 13:39:40

解决方案5
1 2010-02-02 13:37:14

解决方案6
0 2010-02-02 13:46:02

解决方案7
0 2010-02-02 14:21:54

解决方案8
0 2010-02-02 14:42:36

解决方案9
0 2010-02-02 16:05:18

如何从数组中获取表示最多的对象

问题描述

9 个解决方案

解决方案1 8 已采纳 2010-02-02 15:41:05

解决方案2 5 2010-02-02 13:36:34

解决方案3 3 2010-02-02 13:38:23

解决方案4 2 2010-02-02 13:39:40

解决方案5 1 2010-02-02 13:37:14

解决方案6 0 2010-02-02 13:46:02

解决方案7 0 2010-02-02 14:21:54

解决方案8 0 2010-02-02 14:42:36

解决方案9 0 2010-02-02 16:05:18

解决方案1
8 已采纳 2010-02-02 15:41:05

解决方案2
5 2010-02-02 13:36:34

解决方案3
3 2010-02-02 13:38:23

解决方案4
2 2010-02-02 13:39:40

解决方案5
1 2010-02-02 13:37:14

解决方案6
0 2010-02-02 13:46:02

解决方案7
0 2010-02-02 14:21:54

解决方案8
0 2010-02-02 14:42:36

解决方案9
0 2010-02-02 16:05:18