简体   繁体   English

Python 中的列表切片与索引?

[英]List slicing vs indexing in Python?

I was trying to process a file in Python.我试图用 Python 处理一个文件。 Long story short, here are two versions of the code I wrote:长话短说,这里是我写的代码的两个版本:

for line in file:
    if line[0:2] == ".I":
        #do something
    elif line[0:2] == ".T":
        #do something else
    elif line[0:2] == ".A":
         ......

There were some 21000 lines in the file.文件中有大约 21000 行。 However when I altered my code to just this:但是,当我将代码更改为以下内容时:

for line in file:
        if line[0] == ".":
            if line[1] == "I":
                #do something                   
            elif line[1] == "T":
                #do something
            elif line[1] == "A":
                ...

the runtime changed dramatically, I mean from 40 mins, to 30 seconds.运行时间发生了巨大变化,我的意思是从 40 分钟到 30 秒。 I know list slicing is O(N), but in this case we were only slicing the first two characters in the string.我知道列表切片是 O(N),但在这种情况下,我们只切片了字符串中的前两个字符。 So what caused it to change this dramatically?那么是什么导致它发生了戏剧性的变化呢?

Indexing is twice as fast as slicing, but this is a comparison of very small numbers.索引的速度是切片的两倍,但这是对非常小的数字的比较。 When run a million times, the difference is about .04 seconds.运行一百万次时,差异约为 0.04 秒。 That's not the difference you see in your code.这不是您在代码中看到的区别。

>>> timeit("s[0:2]=='aa'", setup="s = '12345'")
0.08988943499571178
>>> timeit("s[0]=='a'", setup="s = '12345'")
0.05322081400663592
>>> timeit("val=='aa'", setup="val='aa'")
0.03722755100170616

You could speed up both cases a little by assigning the slice or index value to a variable once and using that for future comparisons.您可以通过将切片或索引值分配给变量一次并将其用于将来的比较来稍微加快这两种情况。 You could also do this in a function which is slightly faster referencing local variables.您也可以在引用局部变量的函数中执行此操作。

Now to the bigger problem.现在到了更大的问题。 Lets say you have 10,000 lines, and 1000 of them start with ".".假设您有 10,000 行,其中 1000 行以“.”开头。 And those lines are evenly distributed between ".A and .Z".这些行均匀分布在“.A 和 .Z”之间。 You will check 23 different values on average.您将平均检查 23 个不同的值。 In the first case, thats 10000 * 23 or 230,000 total checks.在第一种情况下,即 10000 * 23 或 230,000 次总检查。 In the second case, you eliminate most candidates with a single check, then the remaining with the average 23 checks.在第二种情况下,您通过一次检查消除了大多数候选者,然后用平均 23 次检查消除了剩余的候选者。 That's (9000) + (1000 * 23) or 32,000 total checks.即 (9000) + (1000 * 23) 或 32,000 次总检查。 A 86% reduction in conditions checked.检查的条件减少了 86%。

Lets go further.让我们走得更远。 Suppose you have ".whatever" values that you aren't interested in. Each one of these has to go through all 26 checks before you realize its a dud.假设您有不感兴趣的“.whatever”值。这些值中的每一个都必须通过所有 26 项检查,然后您才意识到它是无用的。 if that's the case, you can group all of your comparators into a set and check that first.如果是这种情况,您可以将所有比较器分组到一个集合中并首先检查。

wanted = {".A", ".B", etc...)
for line in file:
    check = line[:2]
    if check in wanted:
        val = check[1]
        if ...
        

You can go even further if you can write your "do_something" code as functions.如果您可以将“do_something”代码编写为函数,则可以走得更远。

def do_thing_A():
    pass
    
def do_thing_B():
    pass
    
def do_nothing():
    pass
    
do_all_the_things = {".A":do_thing_A, ".B":do_thing_B}

for line in file:
    do_all_the_things.get(line[:2], do_nothing)()

我正在更多地研究幕后发生的事情的细节,但根据Python Wiki ,索引具有恒定的时间复杂度 (O(1)),而切片的复杂度取决于切片的大小,O (k)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM