Python：为什么多个列表理解似乎比使用 if...elif 语句的单个 for 循环更快？

Question

I have a bit of code that I am trying to determine if there is a faster way to run it.我有一些代码，我试图确定是否有更快的运行方法。 Essentially, I have a delimited file that I am iterating over to find a set of flags to parse the data.本质上，我有一个带分隔符的文件，我正在迭代该文件以找到一组标志来解析数据。 These files can be very long, so I am trying to find a fast method for this.这些文件可能很长，所以我试图为此找到一种快速的方法。

The two methods I have tried are list comprehension, and a for loop:我尝试过的两种方法是列表理解和 for 循环：

Method 1:方法一：

flag_set_1 = [i for i,row in enumerate(data_file) if row[0] == flag_1]
flag_set_2 = [i for i,row in enumerate(data_file) if row[0] == flag_2]
flag_set_3 = [i for i,row in enumerate(data_file) if row[0] == flag_3]
flag_set_4 = [i for i,row in enumerate(data_file) if row[0] == flag_4]

Method 2:方法二：

for i,row  in enumerate(data_file):
    if row[0] == flag_1:
        flag_set_1.append(i)
    elif row[0] == flag_2:
        flag_set_2.append(i)
    elif row[0] == flag_3:
        flag_set_3.append(i)
    elif row[0] == flag_4:
        flag_set_4.append(i)

I was actually expecting the list comprehension to be slower in this case.在这种情况下，我实际上期望列表理解会变慢。 Thinking that method 1 would have to iterate over data_file 4 times while method 2 would only have to iterate once.认为方法 1 必须迭代 data_file 4 次，而方法 2 只需迭代一次。 I suspect that the use of append() in method 2 is what is slowing it down.我怀疑在方法 2 中使用 append() 是减慢速度的原因。

So I ask, is there a quicker way to implement this?所以我问，有没有更快的方法来实现这个？

Answer 1

Without any data sample or benchmark, it's hard too reproduce your observation.没有任何数据样本或基准，也很难重现您的观察结果。 I tried with:我试过：

from random import randint
data_file = [[randint(0, 15) for _ in range(20)] for _ in range(100000)]
flag_1 = 1
flag_2 = 2
flag_3 = 3
flag_4 = 4

And the regular loop was twice as fast as the four list comprehensions (see benchmark below).常规循环的速度是四个列表理解的两倍（参见下面的基准）。

If you want to improve the speed of the process, you have several leads.如果你想提高这个过程的速度，你有几个线索。

List comprehensions and regular loop列表理解和常规循环

If flag_n are strings and you are sure that row[0] is one of these for every row , then you may check one character instead of the whole string.如果flag_n是字符串并且您确定row[0]是每一row中的其中一个，那么您可以检查一个字符而不是整个字符串。 Eg:例如：

flag_1 = "first flag"
flag_2 = "second flag"
flag_3 = "third flag"
flag_4 = "fourth flag"

Look at the second characters: f<I>rst, S<E>cond, T<H>ird, F<O>urth .查看第二个字符： f<I>rst, S<E>cond, T<H>ird, F<O>urth 。 You just have to check if row[0][1] == 'i' (or 'e' or 'h' or 'o' ) instead of row[0] == flag_n .您只需要检查row[0][1] == 'i' （或'e'或'h'或'o' ）而不是row[0] == flag_n 。

Regular loop常规循环

If you want to improve the speed of the regular loop, you have several leads.如果你想提高常规循环的速度，你有几个线索。

In all cases在所有情况下

You can assign flag = row[0] instead of getting row[0] the first elements four times.您可以分配flag = row[0]而不是四次获取row[0]的第一个元素。 That's basic, but it works.这是基本的，但它有效。

If you have information about the data如果您有关于数据的信息

If the data is sorted by flag, you can obviously build the flag_n_set at once: find the first the last flag_n and write flag_n_set = list(range(first_flag_n_index, last_flag_n_index+1)) .如果数据按标志排序，您显然可以立即构建flag_n_set ：找到第一个最后一个flag_n并编写flag_n_set = list(range(first_flag_n_index, last_flag_n_index+1)) 。

If you know the frequency of the flags, you can order the if... elif... elif... elif... else to first check the more frequent flag, then the second most frequent flag, etc.如果你知道标志的频率，你可以命令if... elif... elif... elif... else首先检查更频繁的标志，然后是第二频繁的标志，等等。

You can also use a dict to avoid the if... elif... sequence.您还可以使用 dict 来避免if... elif...序列。 If you don't have too many rows that don't match any flag, you can use a defaultdict :如果您没有太多与任何标志不匹配的行，则可以使用defaultdict ：

from collections import defaultdict

def test_append_default_dict():
    flag_set = defaultdict(list)

    for i, row  in enumerate(data_file):
        flag_set[row[0]].append(i)

    return tuple(flag_set[f] for f in (flag_1, flag_2, flag_3, flag_4))

Benchmarks with the data above:以上数据的基准：

test_list_comprehensions    3.8617278739984613
test_append                 1.9978336450003553
test_append_default_dict    1.4595633919998363

Python：为什么多个列表理解似乎比使用 if...elif 语句的单个 for 循环更快？

问题描述

1 个解决方案

解决方案1
0 2020-05-01 13:28:48

List comprehensions and regular loop列表理解和常规循环

Regular loop常规循环

In all cases在所有情况下

If you have information about the data如果您有关于数据的信息

Python：为什么多个列表理解似乎比使用 if...elif 语句的单个 for 循环更快？

问题描述

1 个解决方案

解决方案1 0 2020-05-01 13:28:48

List comprehensions and regular loop列表理解和常规循环

Regular loop常规循环

In all cases在所有情况下

If you have information about the data如果您有关于数据的信息

解决方案1
0 2020-05-01 13:28:48