在 python 中，为什么 string.count() 比循环快？

Question

在leetcode中，我有一个问题要检查无序的字符串“U”、“D”、“L”、“R”是否会形成一个圆圈。

我的提交是这样的：

def judgeCircle(moves):

    l=r=u=d=0

    for i in moves:
        if i == 'L':
            l+=1
        if i == 'D':
            d+=1
        if i == 'R':
            r+=1
        if i == 'U':
            u+=1

    return ((l-r)==0) and ((u-d)==0)

并且判断者认为它花费了 239ms 而另一个单行解决方案：

def judgeCircle(moves):
    return (moves.count('R')==moves.count('L')) and 
           (moves.count('U')==moves.count('D'))

仅花费 39 毫秒？

虽然我理解的代码越少越好，但我认为第二个会循环 4 次，我误解了吗？

谢谢

Answer 1

下面是一些timeit显示的各种方法的速度码，同时使用所有4个按键等于计数，并与各个按键的数目大致相同的随机数据的完美数据。

#!/usr/bin/env python3

''' Test speeds of various algorithms that check
    if a sequence of U, D, L, R moves make a closed circle.

    See https://stackoverflow.com/q/46568696/4014959

    Written by PM 2Ring 2017.10.05
'''

from timeit import Timer
from random import seed, choice, shuffle
from collections import Counter, defaultdict

def judge_JH0(moves):
    l = r = u = d = 0
    for i in moves:
        if i == 'L':
            l += 1
        if i == 'D':
            d += 1
        if i == 'R':
            r += 1
        if i == 'U':
            u += 1
    return ((l-r) == 0) and ((u-d) == 0)

def judge_JH1(moves):
    l = r = u = d = 0
    for i in moves:
        if i == 'L':
            l += 1
        elif i == 'D':
            d += 1
        elif i == 'R':
            r += 1
        elif i == 'U':
            u += 1
    return (l == r) and (u == d)

def judge_count(moves):
    return ((moves.count('R') == moves.count('L')) and 
        (moves.count('U') == moves.count('D')))

def judge_counter(moves):
    d = Counter(moves)
    return (d['R'] == d['L']) and (d['U'] == d['D'])

def judge_dict(moves):
    d = {}
    for c in moves:
        d[c] = d.get(c, 0) + 1
    return ((d.get('R', 0) == d.get('L', 0)) and 
        (d.get('U', 0) == d.get('D', 0)))

def judge_defdict(moves):
    d = defaultdict(int)
    for c in moves:
        d[c] += 1
    return (d['R'] == d['L']) and (d['U'] == d['D'])


# All the functions
funcs = (
    judge_JH0,
    judge_JH1,
    judge_count,
    judge_counter,
    judge_dict,
    judge_defdict,
)

def verify(data):
    print('Verifying...')
    for func in funcs:
        name = func.__name__
        result = func(data)
        print('{:20} : {}'.format(name, result))
    print()

def time_test(data, loops=100):
    timings = []
    for func in funcs:
        t = Timer(lambda: func(data))
        result = sorted(t.repeat(3, loops))
        timings.append((result, func.__name__))
    timings.sort()
    for result, name in timings:
        print('{:20} : {}'.format(name, result))
    print()

# Make some data
keys = 'DLRU'
seed(42)
size = 100

perfect_data = list(keys * size)
shuffle(perfect_data)
print('Perfect')
verify(perfect_data)

random_data = [choice(keys) for _ in range(4 * size)]
print('Random data stats:')
for k in keys:
    print(k, random_data.count(k))
print()
verify(random_data)

loops = 1000
print('Testing perfect_data')
time_test(perfect_data, loops=loops)

print('Testing random_data')
time_test(random_data, loops=loops)

典型输出

Perfect
Verifying...
judge_JH0            : True
judge_JH1            : True
judge_count          : True
judge_counter        : True
judge_dict           : True
judge_defdict        : True

Random data stats:
D 89
L 100
R 101
U 110

Verifying...
judge_JH0            : False
judge_JH1            : False
judge_count          : False
judge_counter        : False
judge_dict           : False
judge_defdict        : False

Testing perfect_data
judge_counter        : [0.11746118000155548, 0.11771785900054965, 0.12218693499744404]
judge_count          : [0.12314812499971595, 0.12353860199800692, 0.12495016200409736]
judge_defdict        : [0.20643479600403225, 0.2069275510002626, 0.20834802299941657]
judge_JH1            : [0.25801684000180103, 0.2689959089984768, 0.27642749399819877]
judge_JH0            : [0.36819701099739177, 0.37400564400013536, 0.40291943999909563]
judge_dict           : [0.3991459790049703, 0.4004156189985224, 0.4040740730051766]

Testing random_data
judge_count          : [0.061543637995782774, 0.06157537500257604, 0.06704995800100733]
judge_counter        : [0.11995147699781228, 0.12068584300141083, 0.1207217440023669]
judge_defdict        : [0.2096717179956613, 0.21544414199888706, 0.220649760995002]
judge_JH1            : [0.261116588000732, 0.26281095200101845, 0.2706491360004293]
judge_JH0            : [0.38465088899829425, 0.38476935599464923, 0.3921787180006504]
judge_dict           : [0.40892754300148226, 0.4094729179996648, 0.4135226650032564]

这些时间是在我在Linux上运行Python 3.6.0的旧2GHz 32位机器上获得的。

这里有几个功能。

def judge_defdictlist(moves):
    d = defaultdict(list)
    for c in moves:
        d[c].append(c)
    return (len(d['R']) == len(d['L'])) and (len(d['U']) == len(d['D']))

# Sort to groups in alphabetical order: DLRU
def judge_sort(moves):
    counts = [sum(1 for _ in g) for k, g in groupby(sorted(moves))]
    return (counts[0] == counts[3]) and (counts[1] == counts[2])

judge_defdictlist慢于judge_defdict但速度比judge_JH1 ，当然它使用超过RAM judge_defdict 。

judge_sort比judge_JH0慢，但比judge_dict快。

Answer 2

两个代码示例都具有O(n)算法复杂度，但是你不应该被大O愚弄，因为它只显示趋势。 O(n)算法的执行时间可以表示为C * n ，其中C是常数，这取决于许多因素。

对于.count()代码，你需要在string_count() C函数中做4个循环，但C函数很快。 它还使用了一些像fastsearch一样的高级算法。 此处仅执行字符串搜索，最小解释器开销。

在纯Python代码中，您只需要单循环，但每次迭代都需要执行更多更低级别的代码，因为Python是解释语言*。 例如，您为循环的每次迭代创建新的unicode或string对象，并且创建对象是一项非常昂贵的操作。 由于整数对象是不可变的，因此您需要为每个计数器重新创建它们。

^{*假设你正在使用CPython解释器，这几乎是默认的}

Answer 3

在第一个代码中考虑因为处理器中的分支预测算法而减速。 在循环内部进行4次不同的if检查，很可能处理器正在进行比以后的代码更多的错误分支预测，其中.count执行一个循环。

如果输入数据按字母顺序排序，那么看时间会很有趣

在 python 中，为什么 string.count() 比循环快？

问题描述

2 个解决方案

解决方案1
4 已采纳 2017-10-04 16:24:23

解决方案2
3 2017-10-04 15:58:58

解决方案3
-1 2017-10-04 16:20:47

在 python 中，为什么 string.count() 比循环快？

问题描述

2 个解决方案

解决方案1 4 已采纳 2017-10-04 16:24:23

解决方案2 3 2017-10-04 15:58:58

解决方案3 -1 2017-10-04 16:20:47

解决方案1
4 已采纳 2017-10-04 16:24:23

解决方案2
3 2017-10-04 15:58:58

解决方案3
-1 2017-10-04 16:20:47