在匹配元素处划分列表的 Pythonic 和有效方法是什么？

Question

This is very similar to Python: split a list based on a condition?这与Python 非常相似：根据条件拆分列表？ and also https://nedbatchelder.com/blog/201306/filter_a_list_into_two_parts.html but instead of partitioning the individual elements into two lists based on a predicate, I want to divide the list into two parts at the first element that fails the predicate.还有https://nedbatchelder.com/blog/201306/filter_a_list_into_two_parts.html但不是基于谓词将单个元素划分为两个列表，我想在第一个元素处将谓词分为两部分。

>>> divide_list(lambda x: x < 7, list(range(10)))
([0, 1, 2, 3, 4, 5, 6], [7, 8, 9])

>>> divide_list(lambda x: x < 7, [1, 3, 5, 7, 9, 5])
([1, 3, 5], [7, 9, 5])

>>> divide_list(lambda x: x < 7, [7, 9, 5])
([], [7, 9, 5])

>>> divide_list(lambda x: x < 7, [1, 3, 5])
([1, 3, 5], [])

>>> divide_list(lambda x: x['a'], [{'a': True, 'b': 1}, {'a': True}, {'a': False}])
([{'a': True, 'b': 1}, {'a': True}], [{'a': False}])

Things to note:注意事项：

the input list may not be sorted输入列表可能未排序
the input list may contain duplicate elements输入列表可能包含重复元素
ideally we don't want to evaluate the condition multiple times (for each element, if the value is duplicated then that's ok)理想情况下，我们不想多次评估条件（对于每个元素，如果值重复，那没关系）
ideally it would accept an iterator as input (ie can only do a single pass over the input data)理想情况下，它会接受一个迭代器作为输入（即只能对输入数据进行一次传递）
returning iterators is acceptable返回迭代器是可以接受的

Answer 1

I think that the naive implementation is probably best unless you actually need iterators as outputs. 我认为除非你真的需要迭代器作为输出，否则天真的实现可能是最好的。 This could be useful if your input stream is an iterator and you don't have enough memory to materialize the whole thing at once, etc. 如果您的输入流是迭代器并且您没有足够的内存来实现整个事物，那么这可能很有用。

In that case, I think that itertools is great. 在那种情况下，我认为itertools很棒。 My initial gut instinct was to do something like: 我最初的直觉是做以下事情：

# broken  :-(
def divide_iter(pred, lst):
    i = iter(lst)
    yield itertools.takewhile(lst, pred)
    yield i

Unfortunately this doesn't work for a variety of reasons. 不幸的是，由于各种原因，这不起作用。 Most notably, it drops an element. 最值得注意的是，它会丢弃一个元素。 Even if it didn't, you could run into problems if you didn't consume the entire takewhile iterable before moving on to the next list. 即使它没有，如果你没有消耗整个takewhile iterable，你可能会遇到问题，然后再转到下一个列表。 I think that this second problem is going to be an issue that we run into when working with iterators in general, so that's kind of a bummer, but it's the price we pay for processing things element-by-element rather than materializing entire lists at once. 我认为第二个问题将是我们在使用迭代器时遇到的一个问题，所以这是一种无赖，但它是我们为逐个元素处理事物所付出的代价，而不是实现整个列表的实现一旦。

Instead, let's think about grouping the items based on whether the predicate has returned true yet. 相反，让我们考虑根据谓词是否返回true来对项目进行分组。 Then groupby becomes a lot more appealing -- the only thing is that we need to keep track of whether the predicate has returned True. 然后groupby变得更有吸引力 - 唯一的事情是我们需要跟踪谓词是否返回True。 Stateful functions are not much fun so instead, we can use a class and pass a bound method as the key argument to groupby : 有状态函数不是很有趣所以相反，我们可以使用一个类并将绑定方法作为关键参数传递给groupby ：

import itertools

class _FoundTracker(object):
    def __init__(self, predicate):
        self.predicate = predicate
        self._found = False

    def check_found(self, value):
        if self._found:
            return True
        else:
           self._found = self.predicate(value)
           return self._found

def split_iterable(iterable, predicate):
    tracker = _FoundTracker(predicate)
    for i, (k, group) in enumerate(itertools.groupby(iterable, key=tracker.check_found)):
        yield group
    if i == 0:
        yield iter(())

if __name__ == '__main__':
    for group in split_iterable(xrange(10), lambda x: x < 5):
        print(list(group))

This also has some possibly funky behavior... To demonstrate, consider: 这也有一些可能很时髦的行为......为了证明，请考虑：

g1, g2 = split_iterable(xrange(10), lambda x: x > 5)
print(list(g1))
print(list(g2))

You'll see that you get some really weird behavior :-). 你会看到你得到一些非常奇怪的行为:-)。 Alternatively: 或者：

g1, g2 = map(list, split_iterable(range(10), lambda x: x > 5))
print(g1)
print(g2)

should work fine. 应该工作正常。

Answer 2

A naive implementation to get things rolling: 让事情滚动的天真实现：

def divide_list(pred, lst):
    before, after = [], []
    found = False
    for item in lst:
        if not found:
            if pred(item):
                before.append(item)
            else:
                found = True
        if found:
            after.append(item)
    return before, after

Answer 3

Here's my relatively efficient attempt: 这是我相对有效的尝试：

from collections import Hashable

def divide_list(pred, list):
    # The predicate may be expensive, so we can
    # store elements that have already been checked
    # in a set for fast verification.
    elements_checked = set()

    # Assuming that every element of the list is of
    # the same type and the list is nonempty, we can
    # store a flag to check if an element is hashable.
    hashable = isinstance(list[0], Hashable)

    for index, element in enumerate(list):
        if hashable and element in elements_checked:
            continue

        if not pred(element):
            return list[:index], list[index:]

        if hashable:
            elements_checked.add(element)

    return list, []

If you were to benchmark this against the other answers, I reckon this will be the fastest. 如果你将其与其他答案进行对比，我认为这将是最快的。

I love this question by the way! 顺便说一句，我喜欢这个问题！

Answer 4

This is basically your naive attempt, but doesn't use a separate Boolean flag to determine when the predicate fails; 这基本上是你的天真尝试，但不使用单独的布尔标志来确定谓词何时失败; it just uses a reference to first one list, then the other, to do the appending. 它只使用对第一个列表的引用，然后使用另一个列表来执行追加。

def divide_list(pred, lst):
     a, b = [], []
     curr = a
     for x in lst:
         if curr is a and not pred(x):
             curr = b
         curr.append(x)
     return a, b

Answer 5

Why complicated if simple possible?如果可能简单，为什么要复杂？ Already mentioned but for in my eyes not understandable reasons dropped from consideration: usage of itertools takewhile .已经提到过，但在我看来是不可理解的原因从考虑中删除：使用itertools takewhile 。

The code below passes all assertion tests and the function itself needs three lines of code:下面的代码通过了所有的断言测试，function 本身需要三行代码：

from itertools import takewhile
def divide_list(pred, lstL):
    header  = list(takewhile(pred, lstL))
    trailer = lstL[len(header):]
    return header, trailer


assert divide_list(lambda x: x < 7, list(range(10))) == ([0, 1, 2, 3, 4, 5, 6], [7, 8, 9])
assert divide_list(lambda x: x < 7, [1, 3, 5, 7, 9, 5]) == ([1, 3, 5], [7, 9, 5])
assert divide_list(lambda x: x < 7, [7, 9, 5]) == ([], [7, 9, 5])
assert divide_list(lambda x: x < 7, [1, 3, 5]) == ([1, 3, 5], [])
assert divide_list(lambda x: x['a'], [{'a': True, 'b': 1}, {'a': True}, {'a': False}]) == ([{'a': True, 'b': 1}, {'a': True}], [{'a': False}])

在匹配元素处划分列表的 Pythonic 和有效方法是什么？

问题描述

5 个解决方案

解决方案1
2 2017-05-11 00:05:10

解决方案2
1 2017-05-10 23:44:19

解决方案3
1 2017-05-10 23:51:59

解决方案4
0 2017-05-11 00:24:52

解决方案5
0 2022-09-17 20:55:02

在匹配元素处划分列表的 Pythonic 和有效方法是什么？

问题描述

5 个解决方案

解决方案1 2 2017-05-11 00:05:10

解决方案2 1 2017-05-10 23:44:19

解决方案3 1 2017-05-10 23:51:59

解决方案4 0 2017-05-11 00:24:52

解决方案5 0 2022-09-17 20:55:02

解决方案1
2 2017-05-11 00:05:10

解决方案2
1 2017-05-10 23:44:19

解决方案3
1 2017-05-10 23:51:59

解决方案4
0 2017-05-11 00:24:52

解决方案5
0 2022-09-17 20:55:02