简体   繁体   English

列出 Comprehension 和 Generators 以避免在使用条件表达式时两次计算相同的值

[英]List Comprehension and Generators to avoid computing the same value twice when using conditional expressions

Pretend you have some expensive, cpu-intensive function, for example parsing an xml string.假设您有一些昂贵的 CPU 密集型 function,例如解析 xml 字符串。 In this case, our trivial function will be:在这种情况下,我们的简单 function 将是:

def parse(foo):
    return int(foo)

As input, you have a list of strings, and you want to parse them and find the subset of parsed strings that meet some condition.作为输入,您有一个字符串列表,并且您想要解析它们并找到满足某些条件的解析字符串的子集。 Ideally we want to perform the parse only one time per string.理想情况下,我们希望每个字符串只执行一次解析。

Without a list comprehension, you could:如果没有列表理解,您可以:

olds = ["1", "2", "3", "4", "5"]
news = []
for old in olds:
    new = parse(old)      # First and only Parse
    if new > 3:
        news.append(new)

To do this as a list comprehension, it seems that you have to perform the parse twice, once to get the new value and once to perform the conditional check:要将此作为列表理解,您似乎必须执行两次解析,一次获取新值,一次执行条件检查:

olds = ["1", "2", "3", "4", "5"]
news = [
    parse(new)         # First Parse
    for new in olds
    if parse(new) > 3  # Second Parse
]

For example, this syntax will not work:例如,此语法将不起作用:

olds = ["1", "2", "3", "4", "5"]
# Raises SyntaxError: can't assign to function call
news = [i for parse(i) in olds if i > 5]

Using a generator seems to work:使用生成器似乎有效:

def parse(strings):
    for string in strings:
        yield int(string)

olds = ["1", "2", "3", "4", "5"]
news = [i for i in parse(olds) if i > 3]

However you could just throw the conditional in the generator:但是,您可以在生成器中抛出条件:

def parse(strings):
    for string in strings:
        val = int(string)
        if val > 3:
            yield val

olds = ["1", "2", "3", "4", "5"]
news = [i for i in parse(olds)]

What I would like to know is, in terms of optimization (not reusability, etc), which one is better, the one where the parsing occurs in the generator but the conditional check occurs in the list comprehension, or the one where both the parsing and the conditional check occurs in the generator?我想知道的是,就优化(而不是可重用性等)而言,哪一个更好,在生成器中进行解析但在列表理解中进行条件检查,或者两者都解析并且条件检查发生在生成器中? Is there a better alternative than either of these approaches?有比这两种方法更好的选择吗?


Here are some output of dis.dis in Python 3.6.5.以下是 Python 3.6.5 中dis.dis的一些 output。 Note that in my version of Python, in order to disassemble list comprehensions, we have to use f.__code__.co_consts[1] .请注意,在我的 Python 版本中,为了反汇编列表推导,我们必须使用f.__code__.co_consts[1] Check this answer for an explanation.检查这个答案以获得解释。

Generator does the parse and List Comprehension does the conditional check生成器进行解析,列表理解进行条件检查

def parse(strings):
    for string in strings:
        yield int(string)

def main(strings):
    return [i for i in parse(strings) if i > 3]

assert main(["1", "2", "3", "4", "5"]) == [4, 5]

dis.dis(main.__code__.co_consts[1])
"""
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                16 (to 22)
              6 STORE_FAST               1 (i)
              8 LOAD_FAST                1 (i)
             10 LOAD_CONST               0 (3)
             12 COMPARE_OP               4 (>)
             14 POP_JUMP_IF_FALSE        4
             16 LOAD_FAST                1 (i)
             18 LIST_APPEND              2
             20 JUMP_ABSOLUTE            4
        >>   22 RETURN_VALUE
"""

dis.dis(parse)
"""
  2           0 SETUP_LOOP              22 (to 24)
              2 LOAD_FAST                0 (strings)
              4 GET_ITER
        >>    6 FOR_ITER                14 (to 22)
              8 STORE_FAST               1 (string)

  3          10 LOAD_GLOBAL              0 (int)
             12 LOAD_FAST                1 (string)
             14 CALL_FUNCTION            1
             16 YIELD_VALUE
             18 POP_TOP
             20 JUMP_ABSOLUTE            6
        >>   22 POP_BLOCK
        >>   24 LOAD_CONST               0 (None)
             26 RETURN_VALUE
"""

Generator does both the parse and the conditional check生成器同时进行解析和条件检查

def parse(strings):
    for string in strings:
        val = int(string)
        if val > 3:
            yield val

def main(strings):
    return [i for i in parse(strings)]

assert main(["1", "2", "3", "4", "5"]) == [4, 5]

dis.dis(main.__code__.co_consts[1])
"""
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 8 (to 14)
              6 STORE_FAST               1 (i)
              8 LOAD_FAST                1 (i)
             10 LIST_APPEND              2
             12 JUMP_ABSOLUTE            4
        >>   14 RETURN_VALUE
"""
dis.dis(parse)
"""
  2           0 SETUP_LOOP              34 (to 36)
              2 LOAD_FAST                0 (strings)
              4 GET_ITER
        >>    6 FOR_ITER                26 (to 34)
              8 STORE_FAST               1 (string)

  3          10 LOAD_GLOBAL              0 (int)
             12 LOAD_FAST                1 (string)
             14 CALL_FUNCTION            1
             16 STORE_FAST               2 (val)

  4          18 LOAD_FAST                2 (val)
             20 LOAD_CONST               1 (3)
             22 COMPARE_OP               4 (>)
             24 POP_JUMP_IF_FALSE        6

  5          26 LOAD_FAST                2 (val)
             28 YIELD_VALUE
             30 POP_TOP
             32 JUMP_ABSOLUTE            6
        >>   34 POP_BLOCK
        >>   36 LOAD_CONST               0 (None)
             38 RETURN_VALUE

Naive tight loop天真的紧循环

def parse(string):
    return int(string)

def main(strings):
    values = []
    for string in strings:
        value = parse(string)
        if value > 3:
            values.append(value)
    return values

assert main(["1", "2", "3", "4", "5"]) == [4, 5]

dis.dis(main)
"""
  2           0 BUILD_LIST               0
              2 STORE_FAST               1 (values)

  3           4 SETUP_LOOP              38 (to 44)
              6 LOAD_FAST                0 (strings)
              8 GET_ITER
        >>   10 FOR_ITER                30 (to 42)
             12 STORE_FAST               2 (string)

  4          14 LOAD_GLOBAL              0 (parse)
             16 LOAD_FAST                2 (string)
             18 CALL_FUNCTION            1
             20 STORE_FAST               3 (value)

  5          22 LOAD_FAST                3 (value)
             24 LOAD_CONST               1 (3)
             26 COMPARE_OP               4 (>)
             28 POP_JUMP_IF_FALSE       10

  6          30 LOAD_FAST                1 (values)
             32 LOAD_ATTR                1 (append)
             34 LOAD_FAST                3 (value)
             36 CALL_FUNCTION            1
             38 POP_TOP
             40 JUMP_ABSOLUTE           10
        >>   42 POP_BLOCK

  7     >>   44 LOAD_FAST                1 (values)
             46 RETURN_VALUE
"""

dis.dis(parse)
"""
  2           0 LOAD_GLOBAL              0 (int)
              2 LOAD_FAST                0 (string)
              4 CALL_FUNCTION            1
              6 RETURN_VALUE
"""

Note how the disassembly of the first two, that use list comprehensions with generators, indicate two for loops, one in the main (list comprehension) and one in the parse (generator).请注意前两个的反汇编,即使用带有生成器的列表推导,指示两个 for 循环,一个在主循环(列表推导)和一个在解析(生成器)中。 This isn't as bad as it sounds, right?这并不像听起来那么糟糕,对吧? Eg, the entire operation is O(n) and not O(n^2)?例如,整个操作是 O(n) 而不是 O(n^2)?

Edit: Here is khelwood's solution:编辑:这是khelwood的解决方案:

def parse(string):
    return int(string)

def main(strings):
    return [val for val in (parse(string) for string in strings) if val > 3]

assert main(["1", "2", "3", "4", "5"]) == [4, 5]

dis.dis(main.__code__.co_consts[1])
"""
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                16 (to 22)
              6 STORE_FAST               1 (val)
              8 LOAD_FAST                1 (val)
             10 LOAD_CONST               0 (3)
             12 COMPARE_OP               4 (>)
             14 POP_JUMP_IF_FALSE        4
             16 LOAD_FAST                1 (val)
             18 LIST_APPEND              2
             20 JUMP_ABSOLUTE            4
        >>   22 RETURN_VALUE
"""

dis.dis(parse)
"""
  2           0 LOAD_GLOBAL              0 (int)
              2 LOAD_FAST                0 (string)
              4 CALL_FUNCTION            1
              6 RETURN_VALUE
"""

I think you can do it more simply than you think:我认为您可以比您想象的更简单:

olds = ["1", "2", "3", "4", "5"]
news = [new for new in (parse(old) for old in olds) if new > 3]

Or just:要不就:

news = [new for new in map(parse, olds) if new > 3]

Both of those ways parse is only called once per item.这两种方式parse每个项目只调用一次。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM