简体   繁体   中英

List Comprehension and Generators to avoid computing the same value twice when using conditional expressions

Pretend you have some expensive, cpu-intensive function, for example parsing an xml string. In this case, our trivial function will be:

def parse(foo):
    return int(foo)

As input, you have a list of strings, and you want to parse them and find the subset of parsed strings that meet some condition. Ideally we want to perform the parse only one time per string.

Without a list comprehension, you could:

olds = ["1", "2", "3", "4", "5"]
news = []
for old in olds:
    new = parse(old)      # First and only Parse
    if new > 3:
        news.append(new)

To do this as a list comprehension, it seems that you have to perform the parse twice, once to get the new value and once to perform the conditional check:

olds = ["1", "2", "3", "4", "5"]
news = [
    parse(new)         # First Parse
    for new in olds
    if parse(new) > 3  # Second Parse
]

For example, this syntax will not work:

olds = ["1", "2", "3", "4", "5"]
# Raises SyntaxError: can't assign to function call
news = [i for parse(i) in olds if i > 5]

Using a generator seems to work:

def parse(strings):
    for string in strings:
        yield int(string)

olds = ["1", "2", "3", "4", "5"]
news = [i for i in parse(olds) if i > 3]

However you could just throw the conditional in the generator:

def parse(strings):
    for string in strings:
        val = int(string)
        if val > 3:
            yield val

olds = ["1", "2", "3", "4", "5"]
news = [i for i in parse(olds)]

What I would like to know is, in terms of optimization (not reusability, etc), which one is better, the one where the parsing occurs in the generator but the conditional check occurs in the list comprehension, or the one where both the parsing and the conditional check occurs in the generator? Is there a better alternative than either of these approaches?


Here are some output of dis.dis in Python 3.6.5. Note that in my version of Python, in order to disassemble list comprehensions, we have to use f.__code__.co_consts[1] . Check this answer for an explanation.

Generator does the parse and List Comprehension does the conditional check

def parse(strings):
    for string in strings:
        yield int(string)

def main(strings):
    return [i for i in parse(strings) if i > 3]

assert main(["1", "2", "3", "4", "5"]) == [4, 5]

dis.dis(main.__code__.co_consts[1])
"""
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                16 (to 22)
              6 STORE_FAST               1 (i)
              8 LOAD_FAST                1 (i)
             10 LOAD_CONST               0 (3)
             12 COMPARE_OP               4 (>)
             14 POP_JUMP_IF_FALSE        4
             16 LOAD_FAST                1 (i)
             18 LIST_APPEND              2
             20 JUMP_ABSOLUTE            4
        >>   22 RETURN_VALUE
"""

dis.dis(parse)
"""
  2           0 SETUP_LOOP              22 (to 24)
              2 LOAD_FAST                0 (strings)
              4 GET_ITER
        >>    6 FOR_ITER                14 (to 22)
              8 STORE_FAST               1 (string)

  3          10 LOAD_GLOBAL              0 (int)
             12 LOAD_FAST                1 (string)
             14 CALL_FUNCTION            1
             16 YIELD_VALUE
             18 POP_TOP
             20 JUMP_ABSOLUTE            6
        >>   22 POP_BLOCK
        >>   24 LOAD_CONST               0 (None)
             26 RETURN_VALUE
"""

Generator does both the parse and the conditional check

def parse(strings):
    for string in strings:
        val = int(string)
        if val > 3:
            yield val

def main(strings):
    return [i for i in parse(strings)]

assert main(["1", "2", "3", "4", "5"]) == [4, 5]

dis.dis(main.__code__.co_consts[1])
"""
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 8 (to 14)
              6 STORE_FAST               1 (i)
              8 LOAD_FAST                1 (i)
             10 LIST_APPEND              2
             12 JUMP_ABSOLUTE            4
        >>   14 RETURN_VALUE
"""
dis.dis(parse)
"""
  2           0 SETUP_LOOP              34 (to 36)
              2 LOAD_FAST                0 (strings)
              4 GET_ITER
        >>    6 FOR_ITER                26 (to 34)
              8 STORE_FAST               1 (string)

  3          10 LOAD_GLOBAL              0 (int)
             12 LOAD_FAST                1 (string)
             14 CALL_FUNCTION            1
             16 STORE_FAST               2 (val)

  4          18 LOAD_FAST                2 (val)
             20 LOAD_CONST               1 (3)
             22 COMPARE_OP               4 (>)
             24 POP_JUMP_IF_FALSE        6

  5          26 LOAD_FAST                2 (val)
             28 YIELD_VALUE
             30 POP_TOP
             32 JUMP_ABSOLUTE            6
        >>   34 POP_BLOCK
        >>   36 LOAD_CONST               0 (None)
             38 RETURN_VALUE

Naive tight loop

def parse(string):
    return int(string)

def main(strings):
    values = []
    for string in strings:
        value = parse(string)
        if value > 3:
            values.append(value)
    return values

assert main(["1", "2", "3", "4", "5"]) == [4, 5]

dis.dis(main)
"""
  2           0 BUILD_LIST               0
              2 STORE_FAST               1 (values)

  3           4 SETUP_LOOP              38 (to 44)
              6 LOAD_FAST                0 (strings)
              8 GET_ITER
        >>   10 FOR_ITER                30 (to 42)
             12 STORE_FAST               2 (string)

  4          14 LOAD_GLOBAL              0 (parse)
             16 LOAD_FAST                2 (string)
             18 CALL_FUNCTION            1
             20 STORE_FAST               3 (value)

  5          22 LOAD_FAST                3 (value)
             24 LOAD_CONST               1 (3)
             26 COMPARE_OP               4 (>)
             28 POP_JUMP_IF_FALSE       10

  6          30 LOAD_FAST                1 (values)
             32 LOAD_ATTR                1 (append)
             34 LOAD_FAST                3 (value)
             36 CALL_FUNCTION            1
             38 POP_TOP
             40 JUMP_ABSOLUTE           10
        >>   42 POP_BLOCK

  7     >>   44 LOAD_FAST                1 (values)
             46 RETURN_VALUE
"""

dis.dis(parse)
"""
  2           0 LOAD_GLOBAL              0 (int)
              2 LOAD_FAST                0 (string)
              4 CALL_FUNCTION            1
              6 RETURN_VALUE
"""

Note how the disassembly of the first two, that use list comprehensions with generators, indicate two for loops, one in the main (list comprehension) and one in the parse (generator). This isn't as bad as it sounds, right? Eg, the entire operation is O(n) and not O(n^2)?

Edit: Here is khelwood's solution:

def parse(string):
    return int(string)

def main(strings):
    return [val for val in (parse(string) for string in strings) if val > 3]

assert main(["1", "2", "3", "4", "5"]) == [4, 5]

dis.dis(main.__code__.co_consts[1])
"""
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                16 (to 22)
              6 STORE_FAST               1 (val)
              8 LOAD_FAST                1 (val)
             10 LOAD_CONST               0 (3)
             12 COMPARE_OP               4 (>)
             14 POP_JUMP_IF_FALSE        4
             16 LOAD_FAST                1 (val)
             18 LIST_APPEND              2
             20 JUMP_ABSOLUTE            4
        >>   22 RETURN_VALUE
"""

dis.dis(parse)
"""
  2           0 LOAD_GLOBAL              0 (int)
              2 LOAD_FAST                0 (string)
              4 CALL_FUNCTION            1
              6 RETURN_VALUE
"""

I think you can do it more simply than you think:

olds = ["1", "2", "3", "4", "5"]
news = [new for new in (parse(old) for old in olds) if new > 3]

Or just:

news = [new for new in map(parse, olds) if new > 3]

Both of those ways parse is only called once per item.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM