简体   繁体   English

检查列表中是否正好有n个项目与python中的条件匹配的最快方法

[英]Fastest way to check if exactly n items in a list match a condition in python

If I have m items in a list, what is the fastest way to check if exactly n of those items in the list meet a certain condition? 如果我在列表中有m个项目,最快的方法是检查列表中的n个是否满足特定条件? For example: 例如:

l = [1,2,3,4,5]

How would I check if any two items in the list match the condition x%2 == 0 ? 如何检查列表中是否有任何两项符合条件x%2 == 0

The naive approach would be to use nested for loops: 天真的方法是使用嵌套的for循环:

for i in l:
    for j in l:
        if not i%2 and not j%2:
            return True

But that is an incredibly inefficient way of checking, and would become especially ugly if I wanted to check for any 50,000 items in a list of 2-10 million items. 但这是一种极其低效的检查方法,如果我要检查2到1千万个项目中的任何50,000个项目,这将变得尤为难看。

[Edited to reflect exact matching, which we can still accomplish with short-circuiting!] [编辑以反映精确匹配,我们仍然可以通过短路来完成!]

I think you'd want this to short-circuit (stop when determined, not only at the end): 我想您希望它短路(确定后停止,不仅结束时停止):

matched = 0
for i in l:
    if i%2 == 0:
        matched += 1
        if matched > 2: # we now have too many matches, stop checking
            break
if matched == 2:
    print("congratulations")

If you wanted to do the query much faster on the same input data several times, you should use NumPy instead (with no short-circuiting): 如果您想对相同的输入数据进行多次查询的速度更快,则应改用NumPy(不要出现短路):

l = np.array([1,2,3,4,5])

if np.count_nonzero(l%2 == 0) == 2:
    print "congratulations"

This doesn't short-circuit, but it will be super-fast once the input array is constructed, so if you have a large input list and lots of queries to do on it, and the queries can't short-circuit very early, this will likely be faster. 这不会短路,但是一旦构建了输入数组,它将是超快的,因此,如果您有一个很大的输入列表并且要执行许多查询,并且查询不能很早就短路,这可能会更快。 Potentially by an order of magnitude. 潜在地增加了一个数量级。

A sum solution adding up True valuesis correct, probably more efficient than an explicit loop, and definitely the most concise: True sum相加的和解是正确的,可能比显式循环更有效,并且绝对是最简洁的:

if sum(i % 2 == 0 for i in lst) == n:

However, it relies on understanding that in an integer context like addition, True counts as 1 and False as 0 . 但是,它依赖于理解,在像加法这样的整数上下文中, True计为1False0 You may not want to count on that. 您可能不想依靠它。 In which case you can rewrite it (squiguy's answer): 在这种情况下,您可以重写它(squiguy的答案):

if sum(1 for i in lst if i % 2 == 0) == n:

But you might want to factor this out into a function: 但是您可能需要将此因素分解为一个函数:

def count_matches(predicate, iterable):
    return sum(predicate(i) for i in iterable)

And at that point, it might arguably be more readable to filter the list and count the length of the resulting filtered iterable instead: 在这一点上, filter列表并计算结果过滤后的可迭代长度可能会更具可读性:

def ilen(iterable):
    return sum(1 for _ in iterable)

def count_matches(predicate, iterable):
    return ilen(filter(predicate, iterable))

However, the down side of all of these variations—as with any use of map or filter is that your predicate has to be a function , not just an expression. 但是,所有这些变体的缺点(就像使用mapfilter是谓词必须是一个函数 ,而不仅仅是一个表达式。 That's fine when you just wanted to check that some_function(x) returns True, but when you want to check x % 2 == 0 , you have to go to the extra step of wrapping it in a function, like this: 当您只想检查some_function(x)返回True时很好,但是当您想检查x % 2 == 0 ,则必须执行将其包装在函数中的额外步骤,如下所示:

if count_matches(lambda x: x %2 == 0, lst) == n

… at which point I think you lose more readability than you gain. …在这一点上,我认为您失去的可读性超过获得的可读性。


Since you asked for the fastest—even though that's probably misguided, since I'm sure any of these solutions are more than fast enough for almost any app, and this is unlikely to be a hotspot anyway—here are some tests with 64-bit CPython 3.3.2 on my computer with a length of 250: 由于您要求最快的速度(即使这可能被误导了),因为我确信这些解决方案中的任何一种对于几乎所有应用程序来说都足够快,而且无论如何这都不是一个热点,因此这里有一些使用64位测试我的计算机上长度为250的CPython 3.3.2:

32.9 µs: sum(not x % 2 for x in lst)
33.1 µs: i=0\nfor x in lst: if not x % 2: i += 1\n
34.1 µs: sum(1 for x in lst if not x % 2)
34.7 µs: i=0\nfor x in lst: if x % 2 == 0: i += 1\n
35.3 µs: sum(x % 2 == 0 for x in lst)
37.3 µs: sum(1 for x in lst if x % 2 == 0)
52.5 µs: ilen(filter(lambda x: not x % 2, lst))
56.7 µs: ilen(filter(lambda x: x % 2 == 0, lst))

So, as it turns out, at least in 64-bit CPython 3.3.2 whether you use an explicit loop, sum up False and True, or sum up 1s if True makes very little difference; 因此,事实证明,至少在64位CPython 3.3.2中,是使用显式循环,对False和True求和,还是对True求和,则取1。 using not instead of == 0 makes a bigger difference in some cases than the others; 在某些情况下,使用not而不是== 0会比其他情况产生更大的差异; but even the worst of these is only 12% worse than the best. 但即使是最坏的情况,也只比最坏的情况差12%。

So I would use whichever one you find most readable. 因此,我将使用您认为可读性最高的一种。 And, if the slowest one isn't fast enough, the fastest one probably isn't either, which means you will probably need to rearrange your app to use NumPy, run your app in PyPy instead of CPython, write custom Cython or C code, or do something else a lot more drastic than just reorganizing this trivial algorithm. 而且,如果最慢的速度不够快,那么最快的速度可能也不足够,这意味着您可能需要重新排列应用程序以使用NumPy,在PyPy中运行应用程序而不是CPython,编写自定义的Cython或C代码,或者做一些比重新组织这个琐碎的算法更激烈的事情。

For comparison, here's some NumPy implementations (assuming lst is a np.ndarray rather than a list ): 为了进行比较,下面是一些NumPy实现(假设lstnp.ndarray而不是list ):

 6.4 µs: len(lst) - np.count_nonzero(lst % 2)
 8.5 µs: np.count_nonzero(lst % 2 == 0)
17.5 µs: np.sum(lst % 2 == 0)

Even the most obvious translation to NumPy is almost twice as fast; 即使是最明显的NumPy转换速度也几乎快一倍。 with a bit of work you can get it 3x faster still. 只需一点工作,您就可以将其速度提高3倍。

And here's the result of running the exact same code in PyPy (3.2.3/2.1b1) instead of CPython: 这是在PyPy(3.2.3 / 2.1b1)中而不是CPython中运行完全相同的代码的结果:

14.6 µs: sum(not x % 2 for x in lst)

More than twice as fast with no change in the code at all. 速度快一倍以上,而无需更改代码。

You might want to look into numpy 您可能要研究numpy

For example: 例如:

In [16]: import numpy as np 
In [17]: a = np.arange(5)

In [18]: a
Out[18]: array([0, 1, 2, 3, 4])

In [19]: np.sum(a % 2 == 0)
Out[19]: 3

Timings: 时间:

In [14]: %timeit np.sum(np.arange(100000) % 2 == 0)
100 loops, best of 3: 3.03 ms per loop

In [15]: %timeit sum(ele % 2 == 0 for ele in range(100000))
10 loops, best of 3: 17.8 ms per loop

However, if you account for conversion from list to numpy.array , numpy is not faster: 但是,如果您要考虑从listnumpy.array转换,则numpy不会更快:

In [20]: %timeit np.sum(np.array(range(100000)) % 2 == 0)
10 loops, best of 3: 23.5 ms per loop

Edit: 编辑:

@abarnert's solution is the fastest: @abarnert的解决方案是最快的:

In [36]: %timeit(len(np.arange(100000)) - np.count_nonzero(a % 2))
10000 loops, best of 3: 80.4 us per loop

I would use a while loop: 我会用while循环:

l=[1,2,3,4,5]

mods, tgt=0,2
while mods<tgt and l:
    if l.pop(0)%2==0:
        mods+=1

print(l,mods)  

If you are concerned about 'fastest' replace the list with a deque : 如果您担心“最快”,请用双端队列替换此列表:

from collections import deque

l=[1,2,3,4,5]
d=deque(l)
mods, tgt=0,2
while mods<tgt and d:
    if d.popleft()%2==0: mods+=1

print(d,mods)     

In either case, it is easy to read and will short circuit when the condition is met. 无论哪种情况,它都易于阅读,并且在满足条件时会短路。

This does do exact matching as written with short-circuiting: 确实与短路写做精确匹配:

from collections import deque

l=[1,2,3,4,5,6,7,8,9]
d=deque(l)
mods, tgt=0,2
while mods<tgt and d:
    if d.popleft()%2==0: mods+=1

print(d,mods,mods==tgt)
# deque([5, 6, 7, 8, 9]) 2 True
# answer found after 4 loops


from collections import deque

l=[1,2,3,4,5,6,7,8,9]
d=deque(l)
mods, tgt=0,2
while mods<tgt and d:
    if d.popleft()%9==0: mods+=1

print(d,mods,mods==tgt)
# deque([]) 1 False
# deque exhausted and less than 2 matches found...

You can also use an iterator over your list: 您还可以在列表上使用迭代器:

l=[1,2,3,4,5,6,7,8,9]
it=iter(l)
mods, tgt=0,2
while mods<tgt:
    try:
        if next(it)%2==0: mods+=1
    except StopIteration:
        break

print(mods==tgt)   
# True

You could use the sum built in with your condition and check that it equals your n value. 您可以使用条件中内置的sum ,并检查它是否等于n值。

l = [1, 2, 3, 4, 5]
n = 2
if n == sum(1 for i in l if i % 2 == 0):
    print(True)

Why don't you just use filter() ? 为什么不只使用filter()?

Ex.: Checking number of even integers in a list: 例如:检查清单中的偶数个整数:

>>> a_list = [1, 2, 3, 4, 5]
>>> matches = list(filter(lambda x: x%2 == 0, a_list))
>>> matches
[2, 4]

then if you want the number of matches: 然后,如果您想要匹配的数量:

>>> len(matches)
2

And finally your answer: 最后是您的答案:

>>> if len(matches) == 2:
        do_something()

Build a generator that returns 1 for each item that matches the criteria and limit that generator to at most n + 1 items, and check that the sum of the ones is equal to the number you're after, eg: 构建一个生成器,为符合条件的每个项返回1 ,并将该生成器限制为最多n + 1个项,并检查这些项的总和等于您要的数字,例如:

from itertools import islice

data = [1,2,3,4,5]
N = 2
items = islice((1 for el in data if el % 2 == 0), N + 1)
has_N = sum(items) == N

This works: 这有效:

>>> l = [1,2,3,4,5]
>>> n = 2
>>> a = 0  # Number of items that meet the condition
>>> for x in l:
...     if x % 2 == 0:
...         a += 1
...         if a > n:
...             break
...
>>> a == n
True
>>>

It has the advantage of running trough the list only once. 它具有只运行一次列表的优点。

Itertools is a useful shortcut for list trolling tasks Itertools是列表拖曳任务的有用快捷方式

import itertools

#where expr is a lambda, such as 'lambda a: a % 2 ==0'
def exact_match_count ( expr, limit,  *values):
    passes = itertools.ifilter(expr, values)
    counter = 0
    while counter <= limit + 1:
        try:
            passes.next()
            counter +=1
        except:
            break
    return counter == limit

if you're concerned about memory limit, tweak the signature so that *values is a generator rather than a tuple 如果您担心内存限制,请调整签名,以便* values是生成器而不是元组

Any candidate for "the fastest solution" needs to have a single pass over the input and an early-out. 任何寻求“最快解决方案”的人都需要对输入进行一次遍历,然后进行一次提前淘汰。

Here is a good base-line starting point for a solution: 这是解决方案的一个很好的基准起点:

>>> s = [1, 2, 3, 4, 5]
>>> matched = 0
>>> for x in s:
        if x % 2 == 0:
            matched += 1
            if matched > 2:
                print 'More than two matched'
else:
    if matched == 2:
        print 'Exactly two matched'
    else:
        print 'Fewer than two matched'


Exactly two matched

Here are some ideas for improving on the the algorithmicially correct baseline solution: 以下是一些改进算法上正确的基准线解决方案的想法:

  1. Optimize the computation of the condition. 优化条件的计算。 For example, replace x % 2 == 0 with not x & 1 . 例如,将x % 2 == 0替换为not x & 1 This is called reduction in strength . 这称为强度降低

  2. Localize the variables. 本地化变量。 Since global lookups and assignments are more expensive than local variable assignments, the exact match test will run faster if it is inside a function. 由于全局查找和赋值比局部变量赋值更昂贵,因此如果在函数内部,则完全匹配测试将运行得更快。

    For example: 例如:

     def two_evens(iterable): 'Return true if exactly two values are even' matched = 0 for x in s: if x % 2 == 0: matched += 1 if matched > 2: return False return matched == 2 
  3. Remove the interpreter overhead by using itertools to drive the looping logic. 通过使用itertools驱动循环逻辑,可以消除解释器的开销。

    For example, itertools.ifilter() can isolate the matches at C-speed: 例如, itertools.ifilter()可以以C速度隔离匹配:

     >>> list(ifilter(None, [False, True, True, False, True])) [True, True, True] 

    Likewise, itertools.islice() can implement the early-out logic at C speed: 同样, itertools.islice()可以以C速度实现早期逻辑:

     >>> list(islice(range(10), 0, 3)) [0, 1, 2] 

    The built-in sum() function can tally the matches at C speed. 内置的sum()函数可以以C速度计算匹配项。

     >>> sum([True, True, True]) 3 

    Put these together to check for an exact number of matches: 将它们放在一起检查匹配的确切数目:

     >>> s = [False, True, False, True, False, False, False] >>> sum(islice(ifilter(None, s), 0, 3)) == 2 True 
  4. These optimizations are only worth doing if it is an actual bottleneck in a real program. 仅当这是实际程序中的实际瓶颈时,这些优化才值得做。 That would typically only occur if you're going to make many such exact-match-count tests. 通常只有在您要进行许多这种完全匹配计数测试时才会发生这种情况。 If so, then there may be additional savings by caching some of the intermediate results on the first pass and then reusing them on subsequent tests. 如果是这样,则可以通过在第一次通过时缓存一些中间结果,然后在后续测试中重复使用这些中间结果,从而进一步节省成本。

    For example, if there is a complex condition, the sub-condition results can potentially be cached and reused. 例如,如果存在复杂条件,则可能会缓存和重用子条件结果。

    Instead of: 代替:

     check_exact(lambda x: x%2==0 and x<10 and f(x)==3, dataset, matches=2) check_exact(lambda x: x<10 and f(x)==3, dataset, matches=4) check_exact(lambda x: x%2==0 and f(x)==3, dataset, matches=6) 

    Pre-compute all the conditions (only once per data value): 预计算所有条件(每个数据值一次):

     evens = map(lambda x: x%2==0, dataset) under_tens = map(lambda x: x<10, dataset) f_threes = map(lambda x: x%2==0 and f(x)==3, dataset) 

A simple way to do it: 一种简单的方法:

def length_is(iter, size):
    for _ in xrange(size - 1):
        next(iter, None)

    try:
        next(iter)
    except StopIteration:
        return False  # too few

    try:
        next(iter)
        return False  # too many
    except StopIteration:
        return True
length_is((i for i in data if x % 2 == 0), 2)

Here's a slightly sillier way to write it: 这是一种稍微愚蠢的写法:

class count(object):
    def __init__(self, iter):
        self.iter = iter

    __eq__ = lambda self, n: length_is(self.iter, n)

Giving: 给予:

count(i for i in data if x % 2 == 0) == 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 删除与Python列表中的条件匹配的前N个项 - Remove the first N items that match a condition in a Python list Python:创建一个包含 n 个列表的列表的最快方法 - Python: fastest way to create a list of n lists 匹配两个列表字段 python 的最快方法 - Fastest way to match two list fields python 检查列表或项目组合中有多少项目与条件匹配的Python方法 - Pythonic way to check how many items in a list or combination of items match a condition 根据Python中的给定条件最小化n的最快方法 - fastest way to minimize n as per given condition in python Python - 检查字符串是否包含列表中任何项目中的特定字符的最快方法 - Python - Fastest way to check if a string contains specific characters in any of the items in a list 检查python中两个列表中所有项的条件 - to check a condition for all the items in two list in python 从带有排除项的 Python 列表中随机抽取 N 个元素的最快方法 - Fastest way to randomly sample N elements from a Python list with exclusions 从列表中提取不匹配python中条件的元素的最快方法 - Fastest way to extract elements from a list that not matched condition in python 检查值或值列表是否是python中列表子集的最快方法 - Fastest way to check if a value or list of values is a subset of a list in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM