简体   繁体   English

Python:为什么多个列表理解似乎比使用 if...elif 语句的单个 for 循环更快?

[英]Python: why is multiple list comprehensions seemingly faster than a single for loop with if...elif statements?

I have a bit of code that I am trying to determine if there is a faster way to run it.我有一些代码,我试图确定是否有更快的运行方法。 Essentially, I have a delimited file that I am iterating over to find a set of flags to parse the data.本质上,我有一个带分隔符的文件,我正在迭代该文件以找到一组标志来解析数据。 These files can be very long, so I am trying to find a fast method for this.这些文件可能很长,所以我试图为此找到一种快速的方法。

The two methods I have tried are list comprehension, and a for loop:我尝试过的两种方法是列表理解和 for 循环:

Method 1:方法一:

flag_set_1 = [i for i,row in enumerate(data_file) if row[0] == flag_1]
flag_set_2 = [i for i,row in enumerate(data_file) if row[0] == flag_2]
flag_set_3 = [i for i,row in enumerate(data_file) if row[0] == flag_3]
flag_set_4 = [i for i,row in enumerate(data_file) if row[0] == flag_4]

Method 2:方法二:

for i,row  in enumerate(data_file):
    if row[0] == flag_1:
        flag_set_1.append(i)
    elif row[0] == flag_2:
        flag_set_2.append(i)
    elif row[0] == flag_3:
        flag_set_3.append(i)
    elif row[0] == flag_4:
        flag_set_4.append(i)

I was actually expecting the list comprehension to be slower in this case.在这种情况下,我实际上期望列表理解会变慢。 Thinking that method 1 would have to iterate over data_file 4 times while method 2 would only have to iterate once.认为方法 1 必须迭代 data_file 4 次,而方法 2 只需迭代一次。 I suspect that the use of append() in method 2 is what is slowing it down.我怀疑在方法 2 中使用 append() 是减慢速度的原因。

So I ask, is there a quicker way to implement this?所以我问,有没有更快的方法来实现这个?

Without any data sample or benchmark, it's hard too reproduce your observation.没有任何数据样本或基准,也很难重现您的观察结果。 I tried with:我试过:

from random import randint
data_file = [[randint(0, 15) for _ in range(20)] for _ in range(100000)]
flag_1 = 1
flag_2 = 2
flag_3 = 3
flag_4 = 4

And the regular loop was twice as fast as the four list comprehensions (see benchmark below).常规循环的速度是四个列表理解的两倍(参见下面的基准)。

If you want to improve the speed of the process, you have several leads.如果你想提高这个过程的速度,你有几个线索。

List comprehensions and regular loop列表理解和常规循环

If flag_n are strings and you are sure that row[0] is one of these for every row , then you may check one character instead of the whole string.如果flag_n是字符串并且您确定row[0]是每一row中的其中一个,那么您可以检查一个字符而不是整个字符串。 Eg:例如:

flag_1 = "first flag"
flag_2 = "second flag"
flag_3 = "third flag"
flag_4 = "fourth flag"

Look at the second characters: f<I>rst, S<E>cond, T<H>ird, F<O>urth .查看第二个字符: f<I>rst, S<E>cond, T<H>ird, F<O>urth You just have to check if row[0][1] == 'i' (or 'e' or 'h' or 'o' ) instead of row[0] == flag_n .您只需要检查row[0][1] == 'i' (或'e''h''o' )而不是row[0] == flag_n

Regular loop常规循环

If you want to improve the speed of the regular loop, you have several leads.如果你想提高常规循环的速度,你有几个线索。

In all cases在所有情况下

You can assign flag = row[0] instead of getting row[0] the first elements four times.您可以分配flag = row[0]而不是四次获取row[0]的第一个元素。 That's basic, but it works.这是基本的,但它有效。

If you have information about the data如果您有关于数据的信息

If the data is sorted by flag, you can obviously build the flag_n_set at once: find the first the last flag_n and write flag_n_set = list(range(first_flag_n_index, last_flag_n_index+1)) .如果数据按标志排序,您显然可以立即构建flag_n_set :找到第一个最后一个flag_n并编写flag_n_set = list(range(first_flag_n_index, last_flag_n_index+1))

If you know the frequency of the flags, you can order the if... elif... elif... elif... else to first check the more frequent flag, then the second most frequent flag, etc.如果你知道标志的频率,你可以命令if... elif... elif... elif... else首先检查更频繁的标志,然后是第二频繁的标志,等等。

You can also use a dict to avoid the if... elif... sequence.您还可以使用 dict 来避免if... elif...序列。 If you don't have too many rows that don't match any flag, you can use a defaultdict :如果您没有太多与任何标志不匹配的行,则可以使用defaultdict

from collections import defaultdict

def test_append_default_dict():
    flag_set = defaultdict(list)

    for i, row  in enumerate(data_file):
        flag_set[row[0]].append(i)

    return tuple(flag_set[f] for f in (flag_1, flag_2, flag_3, flag_4))

Benchmarks with the data above:以上数据的基准:

test_list_comprehensions    3.8617278739984613
test_append                 1.9978336450003553
test_append_default_dict    1.4595633919998363

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 列表推导中的 Elif 语句 - Elif statements in list comprehensions Python 列表理解中的多个 if/elif 语句 - Multiple if/elif statements in a Python list comprehension 比较列表推导和显式循环(3个数组生成器比1更快地循环) - Comparing list comprehensions and explicit loops (3 array generators faster than 1 for loop) Python效率-比多个if语句更好的版本-使用for循环-chr ord转换? - Python Efficiency - Better version than multiple if elif statements - use a for loop - chr ord conversions? 在List Comprehensions中使用if,elif,else,Python - Using if, elif, else in List Comprehensions, Python 为什么原生python列表上的for循环比numpy数组上的for循环快 - Why is for loop on native python list faster than for loop on numpy array 将多个if和elif语句应用于for循环中的字符串列表中的子字符串 - applying multiple if and elif statements to substrings in a list of strings in a for loop Python嵌套for循环比单循环更快 - Python nested for loop faster than single for loop 带有列表推导式的单行 if-elif-else - Single-line if-elif-else with list comprehensions 如何对多个 if elif 和 else 语句使用 python 列表推导式 - How to use python list comprehension for multiple if elif and else statements
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM