简体   繁体   English

有效地检查一个元素是否在列表中至少出现 n 次

[英]Efficiently check if an element occurs at least n times in a list

How to best write a Python function ( check_list ) to efficiently test if an element ( x ) occurs at least n times in a list ( l )?如何最好地编写 Python 函数 ( check_list ) 以有效测试元素 ( x ) 在列表 ( l ) 中是否至少出现n次?

My first thought was:我的第一个想法是:

def check_list(l, x, n):
    return l.count(x) >= n

But this doesn't short-circuit once x has been found n times and is always O(n).但是一旦x被找到n次并且总是 O(n),这不会短路。

A simple approach that does short-circuit would be:一个简单的短路方法是:

def check_list(l, x, n):
    count = 0
    for item in l:
        if item == x:
            count += 1
            if count == n:
                return True
    return False

I also have a more compact short-circuiting solution with a generator:我还有一个更紧凑的带发电机的短路解决方案:

def check_list(l, x, n):
    gen = (1 for item in l if item == x)
    return all(next(gen,0) for i in range(n))

Are there other good solutions?还有其他好的解决方案吗? What is the best efficient approach?什么是最有效的方法?

Thank you谢谢

Instead of incurring extra overhead with the setup of a range object and using all which has to test the truthiness of each item, you could use itertools.islice to advance the generator n steps ahead, and then return the next item in the slice if the slice exists or a default False if not:您可以使用itertools.islice将生成器提前n步,而不是通过设置range对象并使用all必须测试每个项目的真实性而产生额外开销,然后返回切片中的下一个项目,如果切片存在或默认为False如果不存在:

from itertools import islice

def check_list(lst, x, n):
    gen = (True for i in lst if i==x)
    return next(islice(gen, n-1, None), False)

Note that like list.count , itertools.islice also runs at C speed.请注意,与list.count一样, itertools.islice也以 C 速度运行。 And this has the extra advantage of handling iterables that are not lists.这具有处理不是列表的可迭代对象的额外优势。


Some timing:一些时间:

In [1]: from itertools import islice

In [2]: from random import randrange

In [3]: lst = [randrange(1,10) for i in range(100000)]

In [5]: %%timeit # using list.index
   ....: check_list(lst, 5, 1000)
   ....:
1000 loops, best of 3: 736 µs per loop

In [7]: %%timeit # islice
   ....: check_list(lst, 5, 1000)
   ....:
1000 loops, best of 3: 662 µs per loop

In [9]: %%timeit # using list.index
   ....: check_list(lst, 5, 10000)
   ....:
100 loops, best of 3: 7.6 ms per loop

In [11]: %%timeit # islice
   ....: check_list(lst, 5, 10000)
   ....:
100 loops, best of 3: 6.7 ms per loop

You could use the second argument of index to find the subsequent indices of occurrences:您可以使用index的第二个参数来查找后续出现的索引:

def check_list(l, x, n):
    i = 0
    try:
        for _ in range(n):
            i = l.index(x, i)+1
        return True
    except ValueError:
        return False

print( check_list([1,3,2,3,4,0,8,3,7,3,1,1,0], 3, 4) )

About index arguments关于index参数

The official documentation does not mention in its Python Tutuorial, section 5 the method's second or third argument, but you can find it in the more comprehensive Python Standard Library, section 4.6 :官方文档在其Python 教程第 5 节中没有提及该方法的第二个或第三个参数,但您可以在更全面的Python 标准库第 4.6 节中找到它:

s.index(x[, i[, j]]) index of the first occurrence of x in s (at or after index i and before index j ) (8) s.index(x[, i[, j]]) xs 中第一次出现的索引(在索引i处或之后和索引j之前) (8)

(8) index raises ValueError when x is not found in s . (8)当在s 中找不到x时, index会引发ValueError When supported, the additional arguments to the index method allow efficient searching of subsections of the sequence.如果支持,索引方法的附加参数允许有效搜索序列的子部分。 Passing the extra arguments is roughly equivalent to using s[i:j].index(x) , only without copying any data and with the returned index being relative to the start of the sequence rather than the start of the slice.传递额外的参数大致相当于使用s[i:j].index(x) ,只是不复制任何数据并且返回的索引相对于序列的开始而不是切片的开始。

Performance Comparison性能比较

In comparing this list.index method with the islice(gen) method, the most important factor is the distance between the occurrences to be found.将此list.index方法与islice(gen)方法进行比较时,最重要的因素是要找到的出现之间的距离。 Once that distance is on average 13 or more, the list.index has a better performance.一旦该距离平均为 13 或更多,则list.index具有更好的性能。 For lower distances, the fastest method also depends on the number of occurrences to find.对于较短的距离,最快的方法还取决于要查找的出现次数。 The more occurrences to find, the sooner the islice(gen) method outperforms list.index in terms of average distance: this gain fades out when the number of occurrences becomes really large.找到的出现次数越多, islice(gen)方法在平均距离方面的性能就list.index优于list.index :当出现次数变得非常大时,这种增益会逐渐消失。

The following graph draws the (approximate) border line, at which both methods perform equally well (the X-axis is logarithmic):下图绘制了(近似的)边界线,在该处两种方法的表现同样出色(X 轴为对数):

在此处输入图片说明

Ultimately short circuiting is the way to go if you expect a significant number of cases will lead to early termination.如果您预计大量案例将导致提前终止,则最终短路是可行的方法。 Let's explore the possibilities:让我们探索一下可能性:

Take the case of the list.index method versus the list.count method (these were the two fastest according to my testing, although ymmv)list.index方法与list.count方法list.count (根据我的测试,这是两个最快的方法,尽管是 ymmv)

For list.index if the list contains n or more of x and the method is called n times.对于list.index如果列表包含 n 个或更多 x 并且该方法被调用 n 次。 Whilst within the list.index method, execution is very fast, allowing for much faster iteration than the custom generator.虽然在 list.index 方法中,执行速度非常快,允许比自定义生成器更快的迭代。 If the occurances of x are far enough apart, a large speedup will be seen from the lower level execution of index .如果 x 的出现相距足够远,则从index的较低级别执行将看到很大的加速。 If instances of x are close together (shorter list / more common x's), much more of the time will be spent executing the slower python code that mediates the rest of the function (looping over n and incrementing i )如果 x 的实例靠近在一起(更短的列表/更常见的 x),则将花费更多的时间来执行调解函数其余部分的较慢的 Python 代码(循环n并递增i

The benefit of list.count is that it does all of the heavy lifting outside of slow python execution. list.count的好处是它可以完成除缓慢的 Python 执行之外的所有繁重工作。 It is a much easier function to analyse, as it is simply a case of O(n) time complexity.这是一个更容易分析的函数,因为它只是 O(n) 时间复杂度的情况。 By spending almost none of the time in the python interpreter however it is almost gaurenteed to be faster for short lists.通过几乎不花时间在 python 解释器上,几乎可以保证短列表的速度更快。

Summary of selection criteria:选择标准概要:

  • shorter lists favor list.count较短的列表有利于list.count
  • lists of any length that don't have a high probability to short circuit favor list.count不太可能短路的任何长度的列表偏爱list.count
  • lists that are long and likely to short circuit favor list.index长且可能短路的列表有利于list.index

I would recommend using Counter from the collections module.我建议使用collections模块中的Counter

from collections import Counter

%%time
[k for k,v in Counter(np.random.randint(0,10000,10000000)).items() if v>1100]

#Output:
    Wall time: 2.83 s
    [1848, 1996, 2461, 4481, 4522, 5844, 7362, 7892, 9671, 9705]

This shows another way of doing it.这显示了另一种方法。

  1. Sort the list.对列表进行排序。
  2. Find the index of the first occurrence of the item.查找该项目第一次出现的索引。
  3. Increase the index by one less than the number of times the item must occur.将索引增加一比项目必须出现的次数少。 (n - 1) (n - 1)
  4. Find if the element at that index is the same as the item you want to find.查找该索引处的元素是否与您要查找的项目相同。

     def check_list(l, x, n): _l = sorted(l) try: index_1 = _l.index(x) return _l[index_1 + n - 1] == x except IndexError: return False
                                       c=0
                                       for i in l:
                                           if i==k:
                                              c+=1
                                       if c>=n:
                                          print("true")
                                       else:
                                          print("false")

Another possibility might be:另一种可能是:

def check_list(l, x, n):
    return sum([1 for i in l if i == x]) >= n

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 获取列表中所有元素平均值的最有效方法,其中每个元素的出现次数至少是列表模式的一半 - Most efficient way to get average of all elements in list where each element occurs at least half as many times as the mode of the list 在不使用itemgetter的情况下打印列表中具有n次出现的元素的项目 - Printing items inside a list which have an element that occurs n times without using itemgetter 查找列表元素是否出现2次? - Find whether list element occurs 2 times? 如果某个元素在 Python 中出现次数超过 n 次,则删除该元素的出现次数 - Delete occurrences of an element if it occurs more than n times in Python Python-查找给定文件中出现n次的单词列表 - Python - Finding the list of words that occurs n times in the given file 检查列表元素是否至少包含 3 个大写字符 - check if list element contains at least 3 uppercase characters 如何有效地检查元素是否在 python 的列表列表中 - How to efficiently check if an element is in a list of lists in python Python:查找另一个列表中包含次数最少的列表元素 - Python: Find the list element that is contained least times in another list 检查字符串列表是否恰好找到n次 - check if list of strings is found exactly n times 在列表中的元素之间添加元素N次 - add an element N times between elements in list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM