简体   繁体   English

在匹配元素处划分列表的 Pythonic 和有效方法是什么?

[英]What is a Pythonic and efficient way to divide a list at a matching element?

This is very similar to Python: split a list based on a condition?这与Python 非常相似:根据条件拆分列表? and also https://nedbatchelder.com/blog/201306/filter_a_list_into_two_parts.html but instead of partitioning the individual elements into two lists based on a predicate, I want to divide the list into two parts at the first element that fails the predicate.还有https://nedbatchelder.com/blog/201306/filter_a_list_into_two_parts.html但不是基于谓词将单个元素划分为两个列表,我想在第一个元素处将谓词分为两部分。

>>> divide_list(lambda x: x < 7, list(range(10)))
([0, 1, 2, 3, 4, 5, 6], [7, 8, 9])

>>> divide_list(lambda x: x < 7, [1, 3, 5, 7, 9, 5])
([1, 3, 5], [7, 9, 5])

>>> divide_list(lambda x: x < 7, [7, 9, 5])
([], [7, 9, 5])

>>> divide_list(lambda x: x < 7, [1, 3, 5])
([1, 3, 5], [])

>>> divide_list(lambda x: x['a'], [{'a': True, 'b': 1}, {'a': True}, {'a': False}])
([{'a': True, 'b': 1}, {'a': True}], [{'a': False}])

Things to note:注意事项:

  • the input list may not be sorted输入列表可能未排序
  • the input list may contain duplicate elements输入列表可能包含重复元素
  • ideally we don't want to evaluate the condition multiple times (for each element, if the value is duplicated then that's ok)理想情况下,我们不想多次评估条件(对于每个元素,如果值重复,那没关系)
  • ideally it would accept an iterator as input (ie can only do a single pass over the input data)理想情况下,它会接受一个迭代器作为输入(即只能对输入数据进行一次传递)
  • returning iterators is acceptable返回迭代器是可以接受的

I think that the naive implementation is probably best unless you actually need iterators as outputs. 我认为除非你真的需要迭代器作为输出,否则天真的实现可能是最好的。 This could be useful if your input stream is an iterator and you don't have enough memory to materialize the whole thing at once, etc. 如果您的输入流是迭代器并且您没有足够的内存来实现整个事物,那么这可能很有用。

In that case, I think that itertools is great. 在那种情况下,我认为itertools很棒。 My initial gut instinct was to do something like: 我最初的直觉是做以下事情:

# broken  :-(
def divide_iter(pred, lst):
    i = iter(lst)
    yield itertools.takewhile(lst, pred)
    yield i

Unfortunately this doesn't work for a variety of reasons. 不幸的是,由于各种原因,这不起作用。 Most notably, it drops an element. 最值得注意的是,它会丢弃一个元素。 Even if it didn't, you could run into problems if you didn't consume the entire takewhile iterable before moving on to the next list. 即使它没有,如果你没有消耗整个takewhile iterable,你可能会遇到问题,然后再转到下一个列表。 I think that this second problem is going to be an issue that we run into when working with iterators in general, so that's kind of a bummer, but it's the price we pay for processing things element-by-element rather than materializing entire lists at once. 我认为第二个问题将是我们在使用迭代器时遇到的一个问题,所以这是一种无赖,但它是我们为逐个元素处理事物所付出的代价,而不是实现整个列表的实现一旦。

Instead, let's think about grouping the items based on whether the predicate has returned true yet. 相反,让我们考虑根据谓词是否返回true来对项目进行分组。 Then groupby becomes a lot more appealing -- the only thing is that we need to keep track of whether the predicate has returned True. 然后groupby变得更有吸引力 - 唯一的事情是我们需要跟踪谓词是否返回True。 Stateful functions are not much fun so instead, we can use a class and pass a bound method as the key argument to groupby : 有状态函数不是很有趣所以相反,我们可以使用一个类并将绑定方法作为关键参数传递给groupby

import itertools

class _FoundTracker(object):
    def __init__(self, predicate):
        self.predicate = predicate
        self._found = False

    def check_found(self, value):
        if self._found:
            return True
        else:
           self._found = self.predicate(value)
           return self._found

def split_iterable(iterable, predicate):
    tracker = _FoundTracker(predicate)
    for i, (k, group) in enumerate(itertools.groupby(iterable, key=tracker.check_found)):
        yield group
    if i == 0:
        yield iter(())

if __name__ == '__main__':
    for group in split_iterable(xrange(10), lambda x: x < 5):
        print(list(group))

This also has some possibly funky behavior... To demonstrate, consider: 这也有一些可能很时髦的行为......为了证明,请考虑:

g1, g2 = split_iterable(xrange(10), lambda x: x > 5)
print(list(g1))
print(list(g2))

You'll see that you get some really weird behavior :-). 你会看到你得到一些非常奇怪的行为:-)。 Alternatively: 或者:

g1, g2 = map(list, split_iterable(range(10), lambda x: x > 5))
print(g1)
print(g2)

should work fine. 应该工作正常。

A naive implementation to get things rolling: 让事情滚动的天真实现:

def divide_list(pred, lst):
    before, after = [], []
    found = False
    for item in lst:
        if not found:
            if pred(item):
                before.append(item)
            else:
                found = True
        if found:
            after.append(item)
    return before, after

Here's my relatively efficient attempt: 这是我相对有效的尝试:

from collections import Hashable

def divide_list(pred, list):
    # The predicate may be expensive, so we can
    # store elements that have already been checked
    # in a set for fast verification.
    elements_checked = set()

    # Assuming that every element of the list is of
    # the same type and the list is nonempty, we can
    # store a flag to check if an element is hashable.
    hashable = isinstance(list[0], Hashable)

    for index, element in enumerate(list):
        if hashable and element in elements_checked:
            continue

        if not pred(element):
            return list[:index], list[index:]

        if hashable:
            elements_checked.add(element)

    return list, []

If you were to benchmark this against the other answers, I reckon this will be the fastest. 如果你将其与其他答案进行对比,我认为这将是最快的。

I love this question by the way! 顺便说一句,我喜欢这个问题!

This is basically your naive attempt, but doesn't use a separate Boolean flag to determine when the predicate fails; 这基本上是你的天真尝试,但不使用单独的布尔标志来确定谓词何时失败; it just uses a reference to first one list, then the other, to do the appending. 它只使用对第一个列表的引用,然后使用另一个列表来执行追加。

def divide_list(pred, lst):
     a, b = [], []
     curr = a
     for x in lst:
         if curr is a and not pred(x):
             curr = b
         curr.append(x)
     return a, b

Why complicated if simple possible?如果可能简单,为什么要复杂? Already mentioned but for in my eyes not understandable reasons dropped from consideration: usage of itertools takewhile .已经提到过,但在我看来是不可理解的原因从考虑中删除:使用itertools takewhile

The code below passes all assertion tests and the function itself needs three lines of code:下面的代码通过了所有的断言测试,function 本身需要三行代码:

from itertools import takewhile
def divide_list(pred, lstL):
    header  = list(takewhile(pred, lstL))
    trailer = lstL[len(header):]
    return header, trailer


assert divide_list(lambda x: x < 7, list(range(10))) == ([0, 1, 2, 3, 4, 5, 6], [7, 8, 9])
assert divide_list(lambda x: x < 7, [1, 3, 5, 7, 9, 5]) == ([1, 3, 5], [7, 9, 5])
assert divide_list(lambda x: x < 7, [7, 9, 5]) == ([], [7, 9, 5])
assert divide_list(lambda x: x < 7, [1, 3, 5]) == ([1, 3, 5], [])
assert divide_list(lambda x: x['a'], [{'a': True, 'b': 1}, {'a': True}, {'a': False}]) == ([{'a': True, 'b': 1}, {'a': True}], [{'a': False}])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 忽略带有括号的列表中的单词的最有效(pythonic)方法是什么? - What is the most efficient (pythonic) way to ignore words in a list that has parantheses? 用一个元素创建列表或将其保留为空的“ pythonic”方法是什么? - What is the “pythonic” way to create a list with one element or just keep it empty? 从列表中弹出随机元素的最pythonic方法是什么? - What is the most pythonic way to pop a random element from a list? 什么是计算列表中元素出现的pythonic方法? - what's the pythonic way to count the occurrence of an element in a list? 在列表中找到与其他元素不同的元素的最pythonic方法是什么? - what is most pythonic way to find a element in a list that is different with other elements? 在for循环中跳到列表的最后一个元素的pythonic方法是什么? - What is the pythonic way to skip to the last element of a list in a for loop? 如何以有效的方式以pythonic方式将每个元素嵌套列出到变量中 - How to do nested list each element to variable in pythonic way in efficient way 大多数pythonic(和有效)的方式将列表成对嵌套 - Most pythonic (and efficient) way of nesting a list in pairs 修复此列表的最pythonic /最有效的方法是什么? - Whats the most pythonic / efficient way to fix this list? 什么是Pythonic方式编写匹配算法 - What is The Pythonic Way for writing matching algorithm
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM