简体   繁体   English

Python timeit模块执行混乱

[英]Python timeit module execution confusion

I'm trying to use the timeit module in Python ( EDIT: We are using Python 3 ) to decide between a couple of different code flows. 我正在尝试使用Python中的timeit模块( 编辑:我们使用Python 3 )来决定几个不同的代码流。 In our code, we have a series of if-statements that test for the existence of a character code in a string, and if it's there replace it like this: 在我们的代码中,我们有一系列if语句来测试字符串中是否存在字符代码,如果有,则将其替换为:

if "<substring>" in str_var:
    str_var = str_var.replace("<substring>", "<new_substring>")

We do this a number of times for different substrings. 对于不同的子串,我们这样做了很多次。 We're debating between that and using just the replace like this: 我们在这之间进行辩论并使用像这样的替换:

str_var = str_var.replace("<substring>", "<new_substring>")

We tried to use timeit to determine which one was faster. 我们尝试使用timeit来确定哪一个更快。 If the first code-block above is "stmt1" and the second is "stmt2", and our setup string looks like 如果上面的第一个代码块是“stmt1”而第二个是“stmt2”,我们的设置字符串看起来像

str_var = '<string><substring><more_string>',

our timeit statements will look like this: 我们的timeit语句如下所示:

timeit.timeit(stmt=stmt1, setup=setup)

and

timeit.timeit(stmt=stmt2, setup=setup)

Now, running it just like that, on 2 of our laptops (same hardware, similar processing load) stmt1 (the statement with the if-statement) runs faster even after multiple runs (3-4 hundredths of a second vs. about a quarter of a second for stmt2). 现在,在我们的两台笔记本电脑上运行它(相同的硬件,类似的处理负载)stmt1(带有if语句的语句)即使在多次运行后也会运行得更快(3-4个百分点,大约四分之一) stmt2的第二个)。

However, if we define functions to do both things (including the setup creating the variable) like so: 但是,如果我们定义函数来做两件事(包括创建变量的设置),如下所示:

def foo():
    str_var = '<string><substring><more_string>'
    if "<substring>" in str_var:
        str_var = str_var.replace("<substring>", "<new_substring>")

and

def foo2():
    str_var = '<string><substring><more_string>'
    str_var = str_var.replace("<substring>", "<new_substring>")

and run timeit like: 和运行timeit像:

timeit.timeit("foo()", setup="from __main__ import foo")
timeit.timeit("foo2()", setup="from __main__ import foo2")

the statement without the if-statement (foo2) runs faster, contradicting the non-functioned results. 没有if语句(foo2)的语句运行得更快,与非功能结果相矛盾。

Are we missing something about how Timeit works? 我们是否遗漏了Timeit的工作原理? Or how Python handles a case like this? 或者Python如何处理这样的案例?

edit here is our actual code: 编辑这里是我们的实际代码:

>>> def foo():
    s = "hi 1 2 3"
    s = s.replace('1','5')

>>> def foo2():
    s = "hi 1 2 3"
    if '1' in s:
        s = s.replace('1','5')


>>> timeit.timeit(foo, "from __main__ import foo")
0.4094226634183542
>>> timeit.timeit(foo2, "from __main__ import foo2")
0.4815539780738618

vs this code: vs这段代码:

>>> timeit.timeit("""s = s.replace("1","5")""", setup="s = 'hi 1 2 3'")
0.18738432400277816
>>> timeit.timeit("""if '1' in s: s = s.replace('1','5')""", setup="s = 'hi 1 2 3'")
0.02985000199987553

I think I've got it. 我想我已经明白了。

Look at this code: 看看这段代码:

timeit.timeit("""if '1' in s: s = s.replace('1','5')""", setup="s = 'hi 1 2 3'")

In this code, setup is run exactly once . 在此代码中, setup只运行一次 That means that s becomes a "global". 这意味着s成为“全球”。 As a result, it gets modified to hi 5 2 3 in the first iteration and in now returns False for all successive iterations . 其结果是,它被修改为hi 5 2 3在第一次迭代和in现在返回False 所有连续迭代

See this code: 看到这段代码:

timeit.timeit("""if '1' in s: s = s.replace('1','5'); print(s)""", setup="s = 'hi 1 2 3'")

This will print out hi 5 2 3 a single time because the print is part of the if statement. 这将打印出hi 5 2 3单时间,因为print是的一部分if语句。 Contrast this, which will fill up your screen with a ton of hi 5 2 3 s: 对比这个,这将填满你的屏幕与吨hi 5 2 3 s:

timeit.timeit("""s = s.replace("1","5"); print(s)""", setup="s = 'hi 1 2 3'")

So the problem here is that the non-function with if test is flawed and is giving you false timings, unless repeated calls on an already processed string is what you were trying to test. 所以这里的问题是if函数的非函数是有缺陷的并且给你错误的时间,除非对你已经处理的字符串的重复调用是你试图测试的。 (If it is what you were trying to test, your function versions are flawed.) The reason the function with if doesn't fair better is because it's running the replace on a fresh copy of the string for each iteration. (如果它是你试图测试的,你的函数版本是有缺陷的。) if的函数不公平的原因是因为它在每次迭代的字符串的新副本上运行replace

The following test does what I believe you intended since it doesn't re-assign the result of the replace back to s , leaving it unmodified for each iteration: 以下测试执行我认为您的意图,因为它不会将replace结果重新分配给s ,使其在每次迭代时都不会修改:

>>> timeit.timeit("""if '1' in s: s.replace('1','5')""", setup="s = 'hi 1 2 3'"
0.3221409016812231
>>> timeit.timeit("""s.replace('1','5')""", setup="s = 'hi 1 2 3'")
0.28558505721252914

This change adds a lot of time to the if test and adds a little bit of time to the non- if test for me, but I'm using Python 2.7. 这个改变为if测试增加了很多时间,并且为我添加了一些非if测试,但我使用的是Python 2.7。 If the Python 3 results are consistent, though, these results suggest that in saves a lot of time when the strings rarely need any replacing. 但是,如果Python 3的结果是一致的,那么这些结果表明, in字符串很少需要替换时节省了大量时间。 If they usually do require replacement, it appears in costs a little bit of time. 如果他们通常需要更换,它出现in费用一点点时间。

Made even weirder by looking at the disassembled code. 通过查看反汇编代码甚至更奇怪。 The second block has the if version (which clocks in faster for me using timeit just as in the OP's example). 第二个块具有if版本(使用timeit为我提供更快的时钟,就像在OP的示例中一样)。

Yet, by looking at the op codes, it purely appears to have 7 extra op codes, starting with the first BUILD_MAP and also involving one extra POP_JUMP_IF_TRUE (presumably for the if statement check itself). 然而,通过查看操作码,它纯粹看起来有7个额外的操作码,从第一个BUILD_MAP ,还涉及一个额外的POP_JUMP_IF_TRUE (可能是if语句检查本身)。 Before and after that, all codes are the same. 在此之前和之后,所有代码都是相同的。

This would suggest that building and performing the check in the if statement somehow reduces the computation time for then checking within the call to replace . 这表明在if语句中构建和执行检查会以某种方式减少计算时间,然后在调用内进行检查以进行replace How can we see specific timing information for the different op codes? 我们如何才能看到不同操作码的具体时序信息?

In [55]: dis.disassemble_string("s='HI 1 2 3'; s = s.replace('1','4')")
          0 POP_JUMP_IF_TRUE 10045
          3 PRINT_NEWLINE
          4 PRINT_ITEM_TO
          5 SLICE+2
          6 <49>
          7 SLICE+2
          8 DELETE_SLICE+0
          9 SLICE+2
         10 DELETE_SLICE+1
         11 <39>
         12 INPLACE_MODULO
         13 SLICE+2
         14 POP_JUMP_IF_TRUE 15648
         17 SLICE+2
         18 POP_JUMP_IF_TRUE 29230
         21 LOAD_NAME       27760 (27760)
         24 STORE_GLOBAL    25955 (25955)
         27 STORE_SLICE+0
         28 <39>
         29 <49>
         30 <39>
         31 <44>
         32 <39>
         33 DELETE_SLICE+2
         34 <39>
         35 STORE_SLICE+1

In [56]: dis.disassemble_string("s='HI 1 2 3'; if '1' in s: s = s.replace('1','4')")
          0 POP_JUMP_IF_TRUE 10045
          3 PRINT_NEWLINE
          4 PRINT_ITEM_TO
          5 SLICE+2
          6 <49>
          7 SLICE+2
          8 DELETE_SLICE+0
          9 SLICE+2
         10 DELETE_SLICE+1
         11 <39>
         12 INPLACE_MODULO
         13 SLICE+2
         14 BUILD_MAP        8294
         17 <39>
         18 <49>
         19 <39>
         20 SLICE+2
         21 BUILD_MAP        8302
         24 POP_JUMP_IF_TRUE  8250
         27 POP_JUMP_IF_TRUE 15648
         30 SLICE+2
         31 POP_JUMP_IF_TRUE 29230
         34 LOAD_NAME       27760 (27760)
         37 STORE_GLOBAL    25955 (25955)
         40 STORE_SLICE+0
         41 <39>
         42 <49>
         43 <39>
         44 <44>
         45 <39>
         46 DELETE_SLICE+2
         47 <39>
         48 STORE_SLICE+1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM