简体   繁体   中英

Lexicographical Sorting of Word List

I need to merge and sort lists of 100,000+ words lexicographically. I currently do it with a slightly modified bubble sort, but at O(n^2) it takes quite a while. Are there any faster algorithms for sorting lists of words? I'm using Python, but if there is a language that can handle this better I'm open to suggestions.

Use the built-in sort() list method:

>>> words = [ 'baloney', 'aardvark' ]
>>> words.sort()
>>> print words
['aardvark', 'baloney']

It uses a O(n lg(n)) sort 1 , the Timsort (which is a modified merge-sort, I believe. It's highly tuned for speed.).


1 As pointed out in the comments, this refers to the number of element comparisons, not the number of low-level operations. Since the elements in this case are strings, and comparing two strings takes min{|S1|, |S2|} character comparisons, the total complexity is O(n lg(n) * |S|) where |S| is the length of the longest string being sorted. This is true of all comparison sorts, however -- the true number of operations varies depending on the cost of the element-comparison function for the type of elements being sorted. Since all comparison sorts use the same comparison function, you can just ignore this subtlety when comparing the algorithmic complexity of these sorts amongst each other.

Any O(nlogn) sorting algorithm will probably do it better then bubble sort, but they will be O(nlogn * |S|)

However, sorting strings can be done in O(n*|S|) where |S| is the length of the average string, using a trie , and a simple DFS .

high-level pseudo code:

1. create a trie from your collection.
2. do a DFS on the trie generated, and add each string 
   to the list when you reach terminal node.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM