简体   繁体   English

在python中排序小写字母

[英]Sorting lowercase letters in python

I am trying to convince myself that a counting sort performs faster than the sorted method in Python. 我试图说服自己,计数排序比Python中的排序方法执行得更快。 However calling the sorted builtin seems to be faster even for large inputs like 10 million elements. 但是,即使对于1000万个元素这样的大型输入,调用带排序的内置函数似乎也更快。 What can I do to make the counting sort faster? 如何使计数排序更快?

I generate a list of lowercase letters to simplify the example to 26 unique values: 我生成了一个小写字母列表,以将示例简化为26个唯一值:

letters = [random.choice(string.ascii_lowercase) for i in range(10000000)]

I then do the following variation on counting sort: 然后,我对计数排序进行以下更改:

def sorted_count(letters):
 counts = [0] * 26
 for letter in letters:
     counts[ord(letter) - 97] += 1
 out = [None] * len(letters)
 j = 0
 for i in range(len(counts)):
     while counts[i] > 0:
         out[j] = chr(i + 97)
         counts[i] -= 1
         j += 1
 return out

Even on 10,000,000 elements the call to sorted(letters) is ~4x faster. 即使在10,000,000个元素上,对sorted(letters)的调用也快了约4倍。 How can I improve the speed of my sort? 如何提高我的排序速度?

Instead of using a while loop inside the forloop at the end . 而不是在forloop末尾使用while循环。 you could simple use 你可以简单地使用

for i in range(len(counts)):
 if counts[i]>0:
     out[j] =counts[i]*chr(i + 97)
 j+=1
return out

Here's a modified function, which is 3 times faster than the proposed one and twice as fast as sorted : 这是一个经过修改的函数,它比建议的函数快3倍,是sorted速度的两倍:

import random
import string
import timeit
N = 1000000
letters = [random.choice(string.ascii_lowercase) for i in range(N)]


def original_sorted_count(letters):
    counts = [0] * 26
    for letter in letters:
        counts[ord(letter) - 97] += 1
    out = [None] * len(letters)
    j = 0
    for i in range(len(counts)):
        while counts[i] > 0:
            out[j] = chr(i + 97)
            counts[i] -= 1
            j += 1
    return out

def eric(letters):
    counts = [0] * 26
    for letter in letters:
        counts[ord(letter) - 97] += 1
    out = []
    for i in range(len(counts)):
        out += [chr(i+97)] * counts[i]
    return out

print('Original : %.3fs' %timeit.timeit(lambda: original_sorted_count(letters), number=20))
print('Sorted   : %.3fs' %timeit.timeit(lambda: sorted(letters), number=20))
print('Eric     : %.3fs' %timeit.timeit(lambda: eric(letters), number=20))

print(eric(letters) == sorted(letters))

It outputs: 它输出:

Original : 9.616s
Sorted   : 6.367s
Eric     : 3.604s
True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM