简体   繁体   English

如何更快地找到列表中出现频率最高的最大元素?

[英]How can I find the most frequent largest element in list faster?

The task is to find the most frequent largest element in given list任务是找到给定列表中出现频率最高的最大元素

# for example we have list a
a = ['1', '3', '3', '2', '1', '1', '4', '3', '3', '1', '6', '6', '3','6', '6', '6']
a.sort(reverse=True)
print(max(a, key=a.count))

how can I make this simple script work faster?我怎样才能使这个简单的脚本运行得更快?

Why do you sort at all?你为什么要排序?

You got n*log(n) for sorting which does nothing for you here - you still need to go over complete a for the max(...) statement and for EACH ELEMENT go through the whole list again to count() its occurences - even if all emements are the same .你得到了n*log(n)用于排序,它在这里对你没有任何作用 - 你仍然a go 完成max(...)语句和每个元素 go 再次通过整个列表来count()它的出现- 即使所有元素都相同

So for因此对于

["a","a","a","a","a"] 

this uses 5 passes and counts the whole lists for each wich leads to O(n²) on top of O(n*log(n)) for sorting - asymptomatically this is bound by O(n²) .这使用 5 遍并计算每个列表的整个列表,导致O(n²)O(n*log(n))之上进行排序 - 无症状地这是由O(n²)约束的。

The basic approach for this kind is to use collections.Counter:这种基本方法是使用 collections.Counter:

from collections import Counter

a = ['1', '3', '3', '2', '1', '1', '4', '3', '3', '1',
     '6', '6', '3','6', '6', '6']
 
c = Counter(a) 

# "3" and "6" occure 5 times, "3" is first in source so it is 
# reported first - to get the max value of all those that occure
# max_times, see below
print(c.most_common(1))[0][0] 

which up to a certain list-size may still be outperformed by list counting - as you remove the need to create a dictionary from your data to begin with - wich also costs time.列表计数可能仍然超过某个列表大小 - 因为您不需要从数据开始创建字典 - 这也会花费时间。


For slightly bigger list Counter wins hands down:对于稍大的列表,Counter 毫无疑问地获胜:

from collections import Counter

# larger sample - for short samples list iteration may outperform Counter
data = list('1332114331663666')

# 6 times your data + 1 times "6" so "6" occures the most
a1 = [*data, *data, *data, *data, *data, *data, "6"]
a2 = sorted(a1, reverse=True)
c = Counter(a1)

def yours():
    return max(a1, key=a1.count)

def yours_sorted():
    return max(a2, key=a2.count)

def counter():
    c = Counter(a1)
    mx = c.most_common(1)[0][1]  # get maximal count amount
    # get max value for same highest count
    return max(v for v,cnt in c.most_common() if cnt==mx) 

def counter_nc():        
    mx = c.most_common(1)[0][1] # get maximal count amount
    # get max value for same highest count
    return max(v for v,cnt in c.most_common() if cnt==mx) 

import timeit

print("Yours:   ", timeit.timeit(stmt=yours, setup=globals, number=10000))
print("Sorted:  ", timeit.timeit(stmt=yours, setup=globals, number=10000))
print("Counter: ", timeit.timeit(stmt=counter, setup=globals, number=10000))
print("NoCreat: ", timeit.timeit(stmt=counter_nc, setup=globals, number=10000))

gives you (roughly):给你(大致):

Yours:    0.558837399999902
Sorted:   0.557338600001458 # for 10k executions saves 2/1000s == noise
Counter:  0.170493399999031 # 3.1 times faster including counter creation
NoCreat:  0.117090099998677 # 5 times faster no counter creation

Even including the creation of the Counter inside the function its time outperforms the O(n²) approach.即使包括在 function 中创建计数器,其时间也优于O(n²)方法。

If you use pure python, collections.Counter may be the best choice.如果使用纯collections.Counter可能是最好的选择。

from collections import Counter

counter = Counter(a)
mode = counter.most_common(1)[0][0]

Of course, it is not difficult to realize it by yourself.当然,自己实现也不难。

counter = {}
counter_get = counter.get
for elem in a:
    counter[elem] = counter_get(elem, 0) + 1
mode = max(counter, key=counter_get)   # or key=counter.__getitem__

For large list, using numpy's array may be faster.对于大列表,使用 numpy 的数组可能会更快。

import numpy as np

ar = np.array(a, dtype=str)
vals, counts = np.unique(ar, return_counts=True)
mode = vals[counts.argmax()]

EDIT编辑

I'm sorry I didn't notice the 'largest' requirement.对不起,我没有注意到“最大”的要求。

If you choose Counter:如果您选择计数器:

max_count = max(counter.values())
largest_mode = max(val for val, count in counter.items() if count == max_count)

# One step in place, a little faster.
largest_mode = max(zip(counter.values(), counter))[1]

Or numpy:或 numpy:

largest_mode = vals[counts == counts.max()].max()
# even vals[counts == counts.max()][-1]
# because vals is sorted.

A from scratch solution would be something likes this:从头开始的解决方案是这样的:


elem_dict = dict()

max_key_el = [0,0]
for el in a:
  if el not in elem_dict:
    elem_dict[el] = 1
  else:
    elem_dict[el] += 1

  if elem_dict[el] >= max_key_el[1]:
     if int(el) > int(max_key_el[0]):
        max_key_el[0] = el
        max_key_el[1] = elem_dict[el] 

print(max_key_el[0])

I use a dictionary elem_dict to store the counters.我使用字典elem_dict来存储计数器。 I store in max_key_el the max [key, value].我将最大值 [key, value] 存储在max_key_el中。 In the first if statement I sum the occurrences of the element and in the second if statement I update the max element, while in the last if I compare also the keys.在第一个 if 语句中,我对元素的出现次数求和,在第二个 if 语句中,我更新了最大元素,而在最后一个 if 语句中,我还比较了键。

Using a list with 20000 elements I obtain:使用包含20000 个元素的列表,我获得:

  • For your solution 3.08 s,对于您的解决方案 3.08 秒,
  • For this solution 16 ms!对于这个解决方案 16 毫秒!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在列表中找到最频繁的 - how to find the most frequent in list 如何在列表中找到几个最频繁的元素 - How to find several most frequent elements in a list 如何在列表中找到最频繁的值? - how to find most frequent values in a list? 如何使用 Pandas 找到最频繁和最不频繁的计数? - How can I find the count of the most frequent and least frequent using Pandas? 如何在列表中找到最常用的列表 - Python - How to find most frequent list within a list - Python 如何在列表中找到某些元素列表中最常见的位置? - How to find the most frequent places in list for certain list of elements? 如何在numpy ndarray中找到最常见的字符串元素? - how to find most frequent string element in numpy ndarray? 我如何在充满字符串的文件中找到每个 position 的最常见字符并返回每个 position 的最常见字符 - how can i find the most frequent character for each position in a file full of strings and return the highest frequent characters for each position 创建常用词列表时出现意外 output。 如何获得给定 class 的前 10 个最常用词? - Unexpected output when creating a list of frequent words. How can I get the top 10 most frequent words for a given class? 如何在python中的数据框中找到最常见的两列组合 - How can I find the most frequent two-column combination in a dataframe in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM