[英]Python: why is multiple list comprehensions seemingly faster than a single for loop with if...elif statements?
I have a bit of code that I am trying to determine if there is a faster way to run it.我有一些代码,我试图确定是否有更快的运行方法。 Essentially, I have a delimited file that I am iterating over to find a set of flags to parse the data.
本质上,我有一个带分隔符的文件,我正在迭代该文件以找到一组标志来解析数据。 These files can be very long, so I am trying to find a fast method for this.
这些文件可能很长,所以我试图为此找到一种快速的方法。
The two methods I have tried are list comprehension, and a for loop:我尝试过的两种方法是列表理解和 for 循环:
Method 1:方法一:
flag_set_1 = [i for i,row in enumerate(data_file) if row[0] == flag_1]
flag_set_2 = [i for i,row in enumerate(data_file) if row[0] == flag_2]
flag_set_3 = [i for i,row in enumerate(data_file) if row[0] == flag_3]
flag_set_4 = [i for i,row in enumerate(data_file) if row[0] == flag_4]
Method 2:方法二:
for i,row in enumerate(data_file):
if row[0] == flag_1:
flag_set_1.append(i)
elif row[0] == flag_2:
flag_set_2.append(i)
elif row[0] == flag_3:
flag_set_3.append(i)
elif row[0] == flag_4:
flag_set_4.append(i)
I was actually expecting the list comprehension to be slower in this case.在这种情况下,我实际上期望列表理解会变慢。 Thinking that method 1 would have to iterate over data_file 4 times while method 2 would only have to iterate once.
认为方法 1 必须迭代 data_file 4 次,而方法 2 只需迭代一次。 I suspect that the use of append() in method 2 is what is slowing it down.
我怀疑在方法 2 中使用 append() 是减慢速度的原因。
So I ask, is there a quicker way to implement this?所以我问,有没有更快的方法来实现这个?
Without any data sample or benchmark, it's hard too reproduce your observation.没有任何数据样本或基准,也很难重现您的观察结果。 I tried with:
我试过:
from random import randint
data_file = [[randint(0, 15) for _ in range(20)] for _ in range(100000)]
flag_1 = 1
flag_2 = 2
flag_3 = 3
flag_4 = 4
And the regular loop was twice as fast as the four list comprehensions (see benchmark below).常规循环的速度是四个列表理解的两倍(参见下面的基准)。
If you want to improve the speed of the process, you have several leads.如果你想提高这个过程的速度,你有几个线索。
If flag_n
are strings and you are sure that row[0]
is one of these for every row
, then you may check one character instead of the whole string.如果
flag_n
是字符串并且您确定row[0]
是每一row
中的其中一个,那么您可以检查一个字符而不是整个字符串。 Eg:例如:
flag_1 = "first flag"
flag_2 = "second flag"
flag_3 = "third flag"
flag_4 = "fourth flag"
Look at the second characters: f<I>rst, S<E>cond, T<H>ird, F<O>urth
.查看第二个字符:
f<I>rst, S<E>cond, T<H>ird, F<O>urth
。 You just have to check if row[0][1] == 'i'
(or 'e'
or 'h'
or 'o'
) instead of row[0] == flag_n
.您只需要检查
row[0][1] == 'i'
(或'e'
或'h'
或'o'
)而不是row[0] == flag_n
。
If you want to improve the speed of the regular loop, you have several leads.如果你想提高常规循环的速度,你有几个线索。
You can assign flag = row[0]
instead of getting row[0]
the first elements four times.您可以分配
flag = row[0]
而不是四次获取row[0]
的第一个元素。 That's basic, but it works.这是基本的,但它有效。
If the data is sorted by flag, you can obviously build the flag_n_set
at once: find the first the last flag_n
and write flag_n_set = list(range(first_flag_n_index, last_flag_n_index+1))
.如果数据按标志排序,您显然可以立即构建
flag_n_set
:找到第一个最后一个flag_n
并编写flag_n_set = list(range(first_flag_n_index, last_flag_n_index+1))
。
If you know the frequency of the flags, you can order the if... elif... elif... elif... else
to first check the more frequent flag, then the second most frequent flag, etc.如果你知道标志的频率,你可以命令
if... elif... elif... elif... else
首先检查更频繁的标志,然后是第二频繁的标志,等等。
You can also use a dict to avoid the if... elif...
sequence.您还可以使用 dict 来避免
if... elif...
序列。 If you don't have too many rows that don't match any flag, you can use a defaultdict
:如果您没有太多与任何标志不匹配的行,则可以使用
defaultdict
:
from collections import defaultdict
def test_append_default_dict():
flag_set = defaultdict(list)
for i, row in enumerate(data_file):
flag_set[row[0]].append(i)
return tuple(flag_set[f] for f in (flag_1, flag_2, flag_3, flag_4))
Benchmarks with the data above:以上数据的基准:
test_list_comprehensions 3.8617278739984613
test_append 1.9978336450003553
test_append_default_dict 1.4595633919998363
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.