简体   繁体   English

为什么count()方法比for循环python更快

[英]Why count() method is faster than a for loop python

Here are 2 functions that do exactly the same thing, but does anyone know why the one using the count() method is much faster than the other? 这里有两个完全相同的函数,但有谁知道为什么使用count()方法比另一个快得多? (I mean how does it work? How is it built?) (我的意思是它是如何工作的?它是如何构建的?)

If possible, I'd like a more understandable answer than what's found here : Algorithm used to implement the Python str.count function or what's in the source code : https://hg.python.org/cpython/file/tip/Objects/stringlib/fastsearch.h 如果可能的话,我想要一个比这里找到的更容易理解的答案: 用于实现Python str.count函数的算法或源代码中的内容: https//hg.python.org/cpython/file/tip/Objects /stringlib/fastsearch.h

def scoring1(seq):
    score = 0
    for i in range(len(seq)):
       if seq[i] == '0':
           score += 1      
    return score

def scoring2(seq):
    score = 0
    score = seq.count('0') 
    return score

seq = 'AATTGGCCGGGGAG0CTTC0CTCC000TTTCCCCGGAAA'
# takes 1min15 when applied to 100 sequences larger than 100 000 characters
score1  = scoring1(seq)
# takes 10 sec when applied to 100 sequences larger than 100 000 characters
score2  = scoring2(seq)

Thanks a lot for your reply 非常感谢您的回复

Because count is executed in the underlying native implementation. 因为count是在底层本机实现中执行的。 The for-loop is executed in slower interpreted code. for循环以较慢的解释代码执行。

@CodeMonkey has already given the answer, but it is potentially interesting to note that your first function can be improved so that it runs about 20% faster: @CodeMonkey已经给出了答案,但可能有趣的是要注意你的第一个函数可以改进,以便运行速度提高20%:

import time, random

def scoring1(seq):
    score=0
    for i in range(len(seq)):
       if seq[i]=='0':
           score+=1      
    return score

def scoring2(seq):
    score=0
    for x in seq:
       score += (x =='0')    
    return score

def scoring3(seq):
    score = 0
    score = seq.count('0') 
    return score

def test(n):
    seq = ''.join(random.choice(['0','1']) for i in range(n))
    functions = [scoring1,scoring2,scoring3]
    for i,f in enumerate(functions):
        start = time.clock()
        s = f(seq)
        elapsed = time.clock() - start
        print('scoring' + str(i+1) + ': ' + str(s) + ' computed in ' + str(elapsed) + ' seconds')

test(10**7)       

Typical output: 典型输出:

scoring1: 5000742 computed in 0.9651326495293333 seconds
scoring2: 5000742 computed in 0.7998054195159483 seconds
scoring3: 5000742 computed in 0.03732172598339578 seconds

Both of the first two approaches are blown away by the built-in count() . 前两种方法都被内置的count()

Moral of the story: when you are not using an already optimized built-in method, you need to optimize your own code. 故事的道德:当您没有使用已经优化的内置方法时,您需要优化自己的代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM